00:15 EdB: for those interested, i've push a WIP MR fro clover printf suport : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6040
00:16 karolherbst: EdB: would be nice to skip the buffer when there is no printf
00:16 EdB: it's
00:16 karolherbst: huh? some llvm abi thing?
00:17 EdB: if there is no printf, there is not llvm.printf.fmt metadata. If there is no metadata I don't bind a buffer
00:17 karolherbst: ohh, I see
00:18 karolherbst: mhh.. wondering how that all works out with the spirv path
00:18 jenatali: The spirv path goes through nir, right?
00:18 karolherbst: yes
00:19 karolherbst: anyway.. I will probably test that tomorrow :D
00:19 karolherbst: should be a quick check
00:19 karolherbst: I am sure the translator explodes or something
00:19 jenatali: Yes, it does
00:19 karolherbst: :D
00:19 karolherbst: oh well
00:19 karolherbst: jenatali: I guess you do have patches for that?
00:20 jenatali: karolherbst: Yeah, haven't ported them to upstream yet though: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/154
00:20 jenatali: Oh, the translator doesn't explode, just calls to printf return -1 ;)
00:20 karolherbst: jenatali: ohh.. I meant the llvm spirv one
00:21 jenatali: Oh, that translator's fine for printf
00:21 karolherbst: okay, cool
00:21 jenatali: It's just not implemented in nir, but works up to spirv
00:21 karolherbst: right.. I guess I can just pick your patches then
00:22 karolherbst: we probably have a different ABI though or something
00:22 karolherbst: uff
00:22 EdB: this implementation is based on AMD behavior
00:22 karolherbst: AMD as in?
00:23 karolherbst: ROCm or something else?
00:23 jenatali: karolherbst: The nir bits should be relatively ABI-agnostic, but yeah you'll probably need tweaks to how the printf intrinsic is lowered compared to what we did for CLOn12
00:23 EdB: as found in LLVM, so probably ROC
00:24 EdB: it's expected that the device put an id in the buffer followed by the values to be outputed
00:24 karolherbst: ohhh
00:24 karolherbst: mhhhh
00:25 karolherbst: yeah.. I'll figure it out
00:25 karolherbst: what I don't want to have is different implementations
00:26 EdB: on runtime you match the id with a format given during the compilation abd extract values from the buffer
00:26 karolherbst: jenatali: how does the format string look like you get?
00:26 jenatali: Yeah that sounds similar to what I did
00:26 karolherbst: is it the llvm one or is that different?
00:26 jenatali: The printf nir intrinsic I added takes a deref of the format string as the first arg
00:27 jenatali: You could manually index those to make it match llvm
00:27 EdB: the id is 4 bytes int
00:27 karolherbst: jenatali: I meant more the format string itself
00:27 jenatali: It's a string literal
00:28 karolherbst: sure, but I am wondering if that's simply the llvm metadata or if something (like the llvm spirv translator) adjusts that one somewhat or creates a new one or wahtever
00:28 jenatali: Nope, it's whatever the app wrote
00:28 karolherbst: mhhh
00:28 karolherbst: so it's different
00:28 jenatali: Hm?
00:28 EdB: llvm gives me : 3:1:4:a float %f
00:29 hanetzer: agd5f: around? your original patch to fix the tiled display setup no longer applies to kernel 5.7.7 or 5.7.10, and it seems the issue it was supposed to fix is not fixed.
00:29 Shibe: does radv not have VK_google_display_timing?
00:29 jenatali: Oh interesting, llvm parses it for you, then the spirv translator unparses it?
00:29 EdB: 3 is the id that will be in the buffer, 1 is the arg count, 4 is the arg size (it could be repeated), the last part is the format
00:29 Shibe: I found a patch in keith packard's mesa branch but it doesnt seem to have been merged
00:30 karolherbst: jenatali: no clue, that's what I am trying to figure out :)
00:30 jenatali: Ohh, sounds like the format part is still raw
00:30 jenatali: You can reconstruct the rest of that metadata from the spirv/nir
00:31 karolherbst: it doesn't seem like the translator reads out "llvm.printf.fmts" mhhh
00:32 karolherbst: jenatali: do you have a spir-v with printf somewhere?
00:32 karolherbst: also.. I'd like to have the opencl format parsing code in a more global position so we don't end up with... two versions
00:34 EdB: llvm.printf.fmts is only referenced by the AMD target, I don't know if there is another target to could output someting different
00:34 karolherbst: ehhh
00:35 curro: EdB: hah, that's really cool
00:35 EdB: thanks
00:35 karolherbst: EdB: is there no "generic" thing?
00:35 karolherbst:doesn't think that adding more AMD specific things to clover is a good idea
00:35 curro: is anyone from Valve around right now?
00:36 curro: while looking at some traces earlier this week i noticed extremely poor GPU and CPU utilization numbers in all games from the Steam store I've tried that use the Source engine, games which use other engines don't seem to be affected
00:36 hanetzer: is the gitlab slow to the point of not loading at all for anyone else?
00:36 curro: vblank_mode is forced to zero in my environment, running on an ICL laptop with recently updated Steam runtime
00:36 curro: the bottleneck doesn't seem to be in the graphics driver nor rendering thread, but in the engine's event loop. it spends most of the time in nanosleep() due to a call SDL_WaitEventTimeout() with a fixed timeout of 20ms. no SDL events are received if i don't touch the controls, which causes the framerate to be artificially capped at a really crappy value...
00:36 EdB: Yeah, but it's the only stuff I have around :/
00:37 karolherbst: EdB: strange...
00:37 curro: in order to confirm that I LD_PRELOAD'ed a libSDL2 with the following hack which causes SDL_WaitEventTimeout() to timeout immediately: https://people.freedesktop.org/~currojerez/sdl2_force_early_wait_event_timeout.patch
00:37 bnieuwenhuizen: hanetzer: not really slower than usual for me
00:37 karolherbst: EdB: do you have an example opencl file?
00:37 curro: FPS of all Source games I've tried goes through the roof, Dota2 improves by ~3x, CS:GO by ~4x, TF2 by nearly 8x with that patch. smells like a serious performance issue in the Source engine...
00:37 EdB: Yes
00:37 mareko: does any driver plan to use HMM?
00:38 karolherbst: mareko: nouveau already has userspace support for it
00:38 Kayden: Plagman: You may be interested in curro's findings ^^
00:38 karolherbst: it's all wired up with clover
00:38 karolherbst: and it does indeed work
00:38 karolherbst: even though there are some kernel bugs still
00:39 mareko: karolherbst: so nvidia has recoverable page faults,
00:39 mareko: ?
00:39 karolherbst: mareko: since pascal, yes
00:39 EdB: karolherbst: https://paste.centos.org/view/09dc734d <-- ready for piglit execute :)
00:39 karolherbst: EdB: would need to reboot as my GPU crashed.. :D but I just want to translate that to llvm -> spir-v
00:39 hanetzer: bnieuwenhuizen: literally unable to load the page here. trying to check for more action on https://gitlab.freedesktop.org/drm/amd/issues/781
00:39 gitbot: drm issue 781 in amd "Regression: DP outputs out of sync on dual-DP tiled 5k screen" [Amdgpu, Bugzilla, Opened]
00:40 bnieuwenhuizen: hanetzer: fine here, no new action since ~1 month
00:40 karolherbst: mareko: any specific reasons you are asking?
00:40 hanetzer: beh. situation's worse now for me.
00:41 karolherbst: anyway.. I added all the SVM functions to clover... the driver just needs to enable whatever the driver needs to do and set a CAP to enable it
00:41 mareko: karolherbst: it looks like we won't use HMM
00:41 karolherbst: it's pretty transparent if you go for the system SVM directly
00:41 karolherbst: mareko: ahh
00:41 karolherbst: mareko: I thought there were plans to use it?
00:41 mareko: karolherbst: yes, "were"
00:41 karolherbst: also.. why not? I doubt you'd get anything else into the kernel
00:41 karolherbst: or you just don't support that stuff at all?
00:42 karolherbst: sounds like a bad idea, but who am I to judge
00:43 airlied: mareko: it's not useful for graphics stuff
00:43 karolherbst: EdB: why that printf declaration though?
00:43 EdB: karolherbst: you can remove thr print_alloc that are AMD specif
00:44 karolherbst: EdB: huh?
00:44 jenatali: EdB: Was about to ask where __printf_alloc was getting called :)
00:44 EdB: karolherbst: because it's currently not in libclc
00:44 karolherbst: ehhhh
00:44 karolherbst: I see
00:45 karolherbst: airlied: right.. we can't even use it with graphics engines :D
00:45 karolherbst: it's compute only
00:46 EdB: jenatali: __print_alloc a fonction that is inserted during LLVM compilation with an AMD target when you call a printf
00:46 EdB: it's use the found where to write in the device memory bufer
00:46 jenatali: Yeah, makes sense
00:47 Plagman: curro: that doesn't seem expected, and probably not the experience of most users
00:47 Plagman: but, seems really bad!
00:47 Plagman: wonder why it's hitting those timeouts on your side
00:48 karolherbst: EdB, jenatali: https://gist.github.com/karolherbst/38e252b3f39fd7e2ce0f86d098004529
00:48 karolherbst: what a mess :D
00:49 Plagman: in fact, the normal baseline with cs:go is to be a bit quicker than windows in a lot of situations, so 4x perf is definitely not supposed to be on the table there
00:49 Plagman: is anything about your setup, particularly input, atypical?
00:49 Plagman: (is wayland involved at all?)
00:49 jenatali: karolherbst: The printed version of a literal string in spv is pretty bad...
00:49 karolherbst: :)
00:50 karolherbst: better ideas? :D
00:50 curro: Plagman: the event handling loop of the engine seems to call SDL_WaitEventTimeout once per frame with that fixed timeout, even though there seem to be no SDL events being generated ever since i'm not moving the mouse nor pressing keys while benchmarking
00:50 jenatali: Nah, if you want to support app-provided spir-v that's what you're going to get
00:50 curro: Plagman: i'm running on X11 right now, nothing particular about input
00:50 karolherbst: ohh.. maybe I should wire up kernels for spirv2nir :D
00:50 karolherbst: then I could even get the nir
00:50 karolherbst: jenatali: jep
00:50 jenatali: You'll need my patch before the printfs actually translate into nir, otherwise you'll just get some -1s
00:51 karolherbst: EdB: also... https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2078
00:51 karolherbst: we really should get this MR merged as well
00:51 curro: Plagman: by events generated "ever" i mean as soon as the game is up and running, there seem to be some events going through during start-up according to the printf's I added
00:51 jenatali: But yeah, basically you'd get printf(string, struct { args }), and then you're free to turn those strings into indices (since they're derefs), and lay out the args however you want
00:51 karolherbst: ahh.. some stuff was extracted: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5038
00:52 karolherbst: I should check it out actually
00:52 karolherbst: okay.. so at least the llvm ir looks sane
00:53 Plagman: curro: isn't that SDL loop in a side thread?
00:54 curro: Plagman: it seems to be in the parent thread, and the actual rendering thread seems to be synchronized with it at the end of every frame so it also gets slowed down
00:54 Plagman: i can spot check some stuff when i'm back at my desk but overall this seems unexpected
00:54 Plagman: since you'd always be running at no more than 30hz when not moving the mouse, right?
00:54 EdB: karolherbst: spirv and AMD doesn't seems to be on the same boat here
00:55 Plagman: or, 50 fps
00:55 Plagman: sorry
00:55 karolherbst: EdB: yeah.. in doubt I side with spirv :p
00:55 karolherbst: but
00:55 karolherbst: the translator is just doing whatever llvm is doing
00:55 karolherbst: sooo....
00:55 karolherbst: if you have target specific lowering then it should be target specific in mesa as well
00:55 jenatali: ^^ agreed
00:56 jenatali: EdB: What's the difference?
00:56 karolherbst: but the entire llvm path is more a "AMD" target than llvm anyway
00:56 karolherbst: soo...
00:56 EdB: spirv will have a greater memory
00:56 EdB: bigger
00:56 curro: Plagman: yes, but actual FPS i get out of the box was much worse than that before I applied the hack, since the actual time spent for each frame would be 20ms plus the actual rendering time
00:56 karolherbst: EdB: *llvm
00:56 karolherbst: it's not spirv doing it
00:57 karolherbst: it's llvm
00:57 karolherbst: or did I missunderstand anything here?
00:57 karolherbst: I was under the impression that the parsed strings was something AMD target specific? no?
00:57 EdB: yes
00:58 Plagman: most users are happily running dota and csgo at 200+ fps so we probably just need to figure out what's going south on your setup
00:58 Plagman: are you replacing the game's own build of libSDL?
00:58 curro: Plagman: I am now (in order to get sensible framerate), i was using the game's own before
00:59 EdB: but looking at what is into spirv , it's seems that all those constant string need to be copied into the print buffer
00:59 Plagman: very weird
01:00 jenatali: EdB: Or processed into metadata by the nir backend
01:00 EdB: unless nir extract them and create an id
01:00 Plagman: does it repro if you disable the steam overlay?
01:00 Plagman: you'd have to stomp LD_PRELOAD in the wrapper script maybe
01:00 karolherbst: jenatali: heh.. did you actually find printf documentation in the spir-v cl spec?
01:00 karolherbst: it doens't seem to be documented :D
01:01 jenatali: karolherbst: I know I did... one sec
01:01 curro: Plagman: not sure. what's the best way to disable it?
01:01 jenatali: karolherbst: https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html#_a_id_misc_a_misc_instructions
01:01 karolherbst: EdB: why?
01:02 karolherbst: you can do whatever in your driver
01:02 karolherbst: we could even parse it into the same format
01:02 karolherbst: jenatali: ohh, right
01:02 karolherbst: that makes sense
01:03 karolherbst: jenatali: mhh.. it seems to be valid spir-v to have a dynamic selection of the format string
01:03 karolherbst: which is... weird, but
01:04 curro: Plagman: just disabled it in the Steam settings screen, let's see what it does
01:05 kisak: curro: that might not be enough, since it still gets LD_PRELOAD'd in
01:05 jenatali: karolherbst: Yeah, but the the pointer has to be to a constant... according to the CL C spec it has to be to a constant defined in the kernel, but yeah SPIR-V doesn't necessarily call that out
01:05 karolherbst: jenatali: sure
01:05 karolherbst: but you can runtime select the string, that's what I meant
01:05 curro: kisak, Plagman: indeed that wasn't enough, doesn't seem to have any effect
01:05 jenatali: karolherbst: As far as I know, yeah you can runtime select the format string between one of multiple optons
01:05 jenatali: options*
01:06 Sachiel: in the past I had luck just deleting the .so for the overlay
01:06 karolherbst: EdB: https://gist.github.com/karolherbst/e9da6a2f060cc9d65af7c8fddea1af16
01:06 karolherbst: mind testing something like that?
01:07 karolherbst: but I guess it will just dispatch that correctly
01:07 karolherbst: and a, b turn into string ids
01:07 EdB: ok
01:07 kisak: curro: maybe open a terminal to csgo's folder and run something like ~/.local/share/Steam/ubuntu12_32/steam-runtime/run.sh ./csgo.sh
01:07 karolherbst: but what happens if the string is used for something else?
01:07 karolherbst: like you also write the string out into some output
01:07 jenatali: karolherbst: That's fine, it'll just also be embedded as an array?
01:08 kisak: curro: that would sidestep Steam's LD_PRELOAD of the steam overlay
01:08 karolherbst: hopefully, yes
01:08 karolherbst: jenatali: I am just wondering if we want to have a preparser in mesa emiting into the same format the AMD target has
01:08 karolherbst: or not
01:08 EdB: I think ther will be one string in the metadata and onther on in the kernel
01:08 karolherbst: then we can also share the code parsing the buffer
01:09 jenatali: karolherbst: Eh, up to you. I have a mild preference for getting the pointer value back in the runtime, rather than an ID, but it doesn't really matter to me
01:09 karolherbst: jenatali: are you copying the string at runtime then or how to you deal with that?
01:10 karolherbst:doesn't see how a pointer would help
01:10 jenatali: karolherbst: Nah, since it has to be a constant, it either needs to be embedded in the kernel (reflected out to the runtime via metadata) or passed by the app (which is probably illegal anyway?)
01:10 jenatali: From the pointer I can look up the string from one of those two sources
01:10 EdB: it runs with output: a float 7.000000, a vector 0,89,-34, hahaha
01:10 karolherbst: jenatali: mhhhhhhhhh
01:11 karolherbst: EdB: that seems wrong, no?
01:11 kisak: curro: the other option is to add LD_PRELOAD= %command% to the game's launch options
01:11 karolherbst: where are the two first writes
01:11 EdB: yes
01:11 karolherbst: or they failed as the args didn't match?
01:11 karolherbst: mhhh
01:11 karolherbst: questions
01:11 karolherbst: try to get a in the first, and b in the second
01:12 karolherbst: soo... pass in 8 as in[0]
01:12 karolherbst: jenatali: hold my beer
01:12 EdB: it give : https://paste.centos.org/view/41fb15ed
01:13 karolherbst: jenatali: clang laughs at you
01:13 EdB: It could also be bug on my side :)
01:14 karolherbst: jenatali, EdB: https://gist.github.com/karolherbst/697cf9805325d4a4fa542fdab9327c3c
01:14 karolherbst: third printf
01:15 karolherbst: the first two generate warnings as I think clang is smart enough to see that the args don't match _all_ possible formats
01:15 karolherbst: but it can't complain about the third one, can it :p
01:15 karolherbst: at least, clang doesn't reject it
01:15 karolherbst: which either scares me, or fascinates me
01:15 karolherbst: not quite sure
01:16 jenatali: Yeah looks like you can get SPIR-V to accept printf opcodes on the result of a select
01:16 karolherbst: sure
01:16 karolherbst: it's just poitner to constant memory :p
01:16 karolherbst: jenatali: be lucky that constant pointers are not castable to generic :p
01:17 karolherbst: or from rather
01:17 jenatali: LLVM at least has no problem compiling CL C to spir with a constant kernel arg as the format string
01:17 karolherbst: jep
01:17 jenatali: https://godbolt.org/z/1r6bE7 - didn't try sending that to spir-v though
01:18 karolherbst: jenatali: it's all in the gist already though :D
01:18 karolherbst: even the spirv
01:19 jenatali: You weren't sending a dynamic format string were you?\
01:19 karolherbst: sure
01:19 jenatali: Oh you were, look at that
01:19 karolherbst: the third printf: https://gist.github.com/karolherbst/697cf9805325d4a4fa542fdab9327c3c
01:19 jenatali: I misread
01:19 karolherbst: yeah
01:20 karolherbst: anyway....
01:20 karolherbst: _fun_
01:20 karolherbst: this could just foil the entire AMD target approach
01:21 karolherbst: _or_ is the runtime expected to preparse it and do even more magic?
01:21 jenatali: It's technically illegal according to the CL C spec though
01:21 karolherbst: questions
01:21 curro: kisak, Plagman: no luck after disabling the Steam overlay that way
01:21 karolherbst: jenatali: according to our interpretation? :p
01:21 EdB: but the kernel compile just fine :D
01:21 EdB: it just ignore line that it can't deal with :D
01:21 jenatali: The format is in the constant address space and must be resolvable at compile time, i.e. cannot be dynamically created by the executing program itself.
01:21 karolherbst: EdB: now insert trash through the arg and see how well the lowering code works :p
01:21 karolherbst: ahh
01:22 karolherbst: :D
01:22 jenatali: I'd say that's hard to misinterpret
01:22 karolherbst: jenatali: which doens't happen :p
01:22 karolherbst: it's not dynamically created
01:22 jenatali: Yeah but it's also not resolvable at compile time
01:23 karolherbst: what does that mean anyway :p
01:23 jenatali: Heh, ok fair
01:23 karolherbst:wishes stricter and clearer phrasing in specs
01:23 karolherbst: *for
01:23 karolherbst: "compile time constant" is a good phrasing
01:24 karolherbst: or at least... better
01:24 karolherbst: jenatali: you know what is annoying? do you know _when_ "compile time" is?
01:24 jenatali: karolherbst: I'd assume at clBuildProgram time? or clCompileProgram?
01:24 karolherbst: sure?
01:25 jenatali: I.e. before the args are set
01:25 karolherbst: "clCompileProgram does not need to wait for the compiler to complete and can return immediately once the compilation can begin." :p
01:25 karolherbst: and you know what I do to be annoying?
01:25 karolherbst: I don't complete compiling until the kernel gets enqueued :p
01:25 jenatali: I mean, same
01:26 jenatali: clCompileProgram just gets to spirv for me, clEnqueueNDRangeKernel is what actually goes through nir and dxil
01:26 karolherbst: so it's perfectly legal to postpone compiling until clEnqueueNDRangeKernel returns
01:26 karolherbst: right
01:26 karolherbst: so...
01:26 karolherbst: what does this "restriction" in the spec even means
01:26 karolherbst: because at this time.. you know the constant string
01:26 jenatali: Eh, if you don't pass a callback to clCompileKernel, then compilation is supposed to complete before returning
01:26 karolherbst: even if it's passed as an arg
01:27 jenatali: And as is, the compile callback has to fire before clEnqueueNDRangeKernel is called
01:27 karolherbst: mhhhhh
01:27 jenatali: (since you have to link it before you can execute it)
01:27 karolherbst:runs out of loopholes
01:27 jenatali: So according to the logical compilation rules in the spec it seems like it needs to be embedded in source to me :P
01:27 EdB: it's the same for AMD I think, kernel get another pass before been upload to device
01:28 jenatali: File a clang bug :D
01:28 karolherbst: :D
01:29 karolherbst: but yeah.. I guess people would still argue that they postpone final compilation until much later
01:29 karolherbst: so it's still technically correct or so
01:29 karolherbst: mhhh
01:29 karolherbst: but that's just... asking for trouble anyway
01:29 karolherbst: but also no problem for a runtime to deal with honestly
01:29 karolherbst: just save it somehow :p
01:29 jenatali: Yeah. W/e, I can handle the constant kernel arg ptr without a problem (I don't currently search those buffers for the format string pointer, but I could)
01:30 jenatali: Hence, I'd prefer if it stayed a pointer :P
01:30 karolherbst: yeah...
01:37 curro: kisak, Plagman: BTW, our performance CI being set up independently by craftyguy gets similarly crappy numbers in Dota2, so i doubt it's something specific to my setup
01:38 curro: the perf CI numbers for CSGO (which I believe is run in a different system) are much better though, which makes me wonder if it's some recent update to the Source engine which introduced this issue
01:40 curro: are you aware of any recent changes in your event handling code or in the synchronization between the event loop and rendering threads?
01:40 EdB: karolherbst: I do have warning at compile time : input.cl:54:35: warning: format specifies type 'int' but the argument has type '__constant char *'
01:41 EdB: printf(in[0] > 5 ? a : b, "I'm the one");
01:43 EdB: anyway, need to sleep now :)
02:13 Plagman: not aware of a change, will recheck recent dota here
02:40 agd5f: hanetzer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b7f839d292948142eaab77cedd031aad0bfec872
02:46 agd5f: mareko, definitely using hmm for ROCm
03:14 mareko: agd5f: I see
03:14 alyssa----: daniels: Joy, so the rk display exposes two PRIMARY planes, one for the big VOP, one for the little VOP.
03:14 alyssa----: Weston picks the first it finds, which is the little VOP.
03:14 alyssa----: But only the big VOP supports AFBC, so Weston is forced to use linear.
03:15 alyssa----: I'm not sure whose bug this is (kernel or weston)
03:15 alyssa----: Both sides are doing exactly what they are expected to, and that's not working.
03:17 alyssa----: To make things more 'interesting', each VOP has its own CRTC
03:17 alyssa----: So actually you need to force Weston to pick the big CRTC (the plane issue is a red herring)
03:18 alyssa----: That's CRTC ID = 1, so dumb algorithms that pick the first working will hit the little VOP. Forcing that CRTC works.
03:18 alyssa----: (I mean, the AFBC side is still totally broken. But this is forward progress.)
03:18 airlied: make crtc 0 the big bop :-P
03:19 alyssa----: airlied: that's one solution for sure =
03:19 alyssa----: =p
03:20 alyssa----:flashes back to trying to get rk3399 overlay planes working in mpv
03:21 alyssa----: anyway, now we're onto "everything is broken"
03:22 alyssa----: who's lying about supported afbc modifiers -- panfrost or the vop? stay tuned for scenes from our next episode
03:22 alyssa----: Answer: the daily double!
03:22 alyssa----: the vop
03:22 alyssa----: ugh
03:22 alyssa----: Guess who's writing a kernel patch a few minutes to midnight
03:22 alyssa----: This girl!
03:24 alyssa----: of course things on the GPU side are also broken in a dazzling number of ways, so I don't exactly have any high ground here ;p
03:24 imirkin: it's always near midnight _somewhere_
03:24 alyssa----: Mostly though I'm confused because this can't possible be conformant
03:25 alyssa----: otoh AFBC's spec is proprietary so... I guess conformance is "does it work for weston" :)
03:27 imirkin: sometimes random code _does_ produce the correct results
03:28 imirkin: there are some address calculations in nouveau that literally make no sense, and yet the hw is happy. who am i to argue.
03:28 alyssa----: =D
03:28 imirkin: add 2 unrelated things, and multiply them by a third unrelated thing. poof - there's your final address. enjoy.
03:29 alyssa----: beautiful
03:29 alyssa----: tomorrow's mess, adios
04:04 jekstrand: alyssa----: You're back on IRC!
04:25 hanetzer: agd5f: yeh, I have that, but the situation is 'worse?'; black screen; the auto-monitor thing shows up but I can't get the monitor to actually display something in x.org (in linux console I get two mirrored prompts on the monitor)
04:28 hanetzer: agd5f: dmesg spams '[drm] Mode Validation Warning: Vertical ratio prefetch failed validation.' while 4k mode is enabled.
05:30 airlied: robclark: I hate the fact that msm uses UTS_RELEASE can we stop?
05:30 airlied:has to rebuild msm every time
05:53 airlied: anholt: mapped buffers regressioned llvmpipe
06:26 daniels: alyssa----: ah yes. welcome to display. :)
06:27 daniels: alyssa----: Weston just blindly follows the CRTC -> connector assignment that was there before it, and that's been gamed such that the big VOP CRTC by default is routed to the external connector
06:27 daniels: because that will likely have a higher resolution
06:44 airlied: karolherbst: https://github.com/airlied/VK-GL-CTS/commits/opengl-cts-4.6.0 are I think the minimum I need to pass
06:44 airlied: I'm going to try and do the gerritt work for the tomorrow
06:45 airlied: I've one other fail on 4.6.0 branch with a robust_buffer test, will also track that down
06:50 airlied: karolherbst: I also filed issue/mr for the kc-cts fails I had
06:53 curro: Plagman: i just confirmed that the performance issue i reported earlier also affects the Vulkan renderer, doesn't seem GL-specific
07:31 MrCooper: mareko: do you mean how to add a job to the Mesa CI pipeline? Or what kind of pipeline do you want to add?
07:32 Akari: sorry if this isn't the right place to ask. Using OpenGL, is it possible to find out which compressed texture formats are supported natively by the GPU and which ones are emulated by the driver?
07:33 MrCooper: mareko: assuming the former, there's docs/ci/index.rst and the existing job definitions in .gitlab-ci.yml & .gitlab-ci/lava-gitlab-ci.yml to look at
07:54 danvet: airlied, can you do the backmerge that sfr describe for the duplicated case 12: lolz pls?
07:54 danvet: drm-tip no compiling right now
07:54 danvet: and the conflict is kinda nasty
08:23 jekstrand: danvet: :-( case 12:
08:23 jekstrand:just updated his drm-tip
08:23 jekstrand:goes back to bed and hopes it's fixed in the morning. :D
08:23 danvet: jekstrand, try again, I just fixed it
08:23 danvet: also airlied botched this one
08:24 jekstrand:starts his build again
08:24 danvet: well I fixed it 16' ago, I guess slow box compiling stuff
08:25 jekstrand: re-uping my dma-buf sync_file patches
08:25 jekstrand: I think I want half of them for real. I think I'm more-or-less un-convinced on the second half. :-/
08:25 tarceri: are these gitlab email confirmation emails legit?
08:27 MrCooper: tarceri: haven't got any
08:29 tarceri: MrCooper: the first one looks like this https://paste.centos.org/view/6bee8ea1
08:30 tarceri: the urls look legit, but the content seems like every other sketchy spam email you could imagine
08:32 jekstrand: I've not recieved one either
08:32 jekstrand: daniels: ^^
08:32 MrCooper: others on #freedesktop have though, and our GitLab did just get upgraded, so probably legit
08:33 daniels: yeah it got upgraded about half an hour ago
08:33 MrCooper: I guess it only affects users which confirmed an e-mail to specific versions of GitLab or something like that
08:33 EdB: tarceri: I got tis message this morning on the webpages : You have to confirm your email address before continuing. Please check your email for the link we sent you, or click 'Resend confirmation email'.
08:34 EdB: so I guess I will reeived one too
08:35 pepp: tarceri: the content really seems to come from gitlab (https://gitlab.com/gitlab-org/gitlab-foss/-/blob/c52b72f5772d52e9fc85bd9f4e8b8497a6278c37/lib/gitlab/background_migration/mailers/views/unconfirm_mailer/unconfirm_notification_email.html.haml)
08:36 EdB: strang I received to different email for 2 adresses on the same user
08:37 MrCooper: each address is confirmed separately, not really strange? :)
08:38 MrCooper: https://gitlab.freedesktop.org/profile says "Please click the link in the confirmation email before continuing.", but I haven't got an e-mail
08:39 EdB: the email content is differrent
08:39 EdB: One is "Use the link below to confirm your email address."
08:40 MrCooper: clicked the resend link, got the e-mail, successfully confirmed
08:40 EdB: The other is more verbose for the other adress
08:41 daniels: can confirm it's legitimate
08:41 MrCooper: from previous experience, GitLab tends to just drop e-mails on the floor instead of queuing them reliably, I suspect that happened to me (and likely many other users)
08:42 MrCooper: (if an e-mail can't be sent immediately for some reason)
08:43 EdB: ah ah now I can approve MR :)
08:44 EdB: it's mine, but still :p
08:47 MrCooper: approvals also cannot be required yet, but they're nice to have anyway
08:48 EdB: would be nice if it auto RB ;)
08:53 daniels: EdB: but to exactly match the previous semantics, we'd need for it to detect whether reviews applied to every single commit or just one, or whether they were a R-b/A-b/T-b/whatever
08:55 kusma: daniels: Do we really need that? When marge merges, she adds a link back to the MR, so there's a paper-trail about the disussion...
08:55 MrCooper: daniels: right, just need an AI comparable to a human brain ;)
08:55 daniels: MrCooper: and ideally less fallible than a human brain
08:56 daniels: kusma: yes, that's exactly my position - that R-b and A-b are so lossy as to be useless
08:56 MrCooper: yeah, no idea why we're still doing that
08:56 daniels: I understand the argument of 'I want to know who I should talk to about this commit', but otoh if you have the link to the MR and the discussion, surely before you go talk to someone about a commit, you'd be going back to read the previous discussion on the MR/issue to see if the question you're going to ask has already been answered
08:57 MrCooper: we've stopped >< this close before being able to merge any MR with one click (OK, and a few keypresses to select Marge :)
08:57 emersion: doesn't gitlab have a "approve MR" feature?
08:57 kusma: daniels: Yeah, I think the practical difference is kinda unclear (as in: when can I merge?), and just reading the dicussion seems to convey much more detail...
08:57 kusma: emersion: that's what we just got
08:58 emersion: ah, approvals, right
08:59 EdB: daniels: I was joking :). But this can be see as a global RB.
08:59 kusma: zmike: BTW-ish, you probably have enough commits in mesa that you should be able to get the permission to assign to marge yourself by now... I don't know what the process is now that we use gitlab, but it'd be worth looking into, I think...
09:01 EdB: I'm no sure about the RB process for Mesa now. It's seems that having a link to the MR is OK now but people still give RB, so it feels strange no to put them on commits
09:01 daniels: MrCooper: tbh I think it's busywork which just annoys contributors for no benefit other than cargo-cult of what was there before, which was to an extent cargo-cult of the kernel
09:02 daniels: kusma, zmike: the process for getting commit access (and this is probably also relevant for jenatali) is to mail mesa-dev@ and ask if it's ok
09:02 kusma: EdB: It's still required as per the path-formatting docs, but maybe we should change that: https://docs.mesa3d.org/submittingpatches.html?highlight=apply#patch-formatting
09:03 kusma: daniels: OK, thanks for chiming in... Yeah, probably a good idea for jenatali as well :)
09:03 EdB: yes, that why it feel strange, because now some people start think that having the MR ink is ok
09:05 Akari: How do i figure out which compressed texture formats are natively supported by the GPU using OpenGL?
09:05 Akari: i was hoping ARB_internalformat_query2 would have something but it doesn't seem like it...
09:09 MrCooper: EdB: all the information is in the MR; having to add the tags to the commit logs mostly just makes merging an MR more cumbersome, for no real benefit
09:10 EdB: I tend to think so
09:10 MrCooper: (BTW, the MR URL in the commit log is just convenience, the GitLab UI links back to the MR anyway)
09:11 lynxeye: MrCooper: but it locks you into the infrastructure, as well as needing to have access to the gitlab web interface. Sometimes it's really neat to just look at the git commit log and have all the information there.
09:12 MrCooper: the tags aren't all the information, just the distilled end product
09:13 MrCooper: anyway, if you guys are volunteering to add the tags to all Mesa MRs, be my guest ;)
09:54 daniels: yeah, it's not 'all the information' by any stretch - they don't actually include all the discussion
09:54 daniels: practically speaking, what are you going to do with the information that a certain person reviewed a certain patch?
09:55 daniels: the only thing I can see is pinging to talk to them, but tbh I'd consider it pretty rude if someone went to ping me about a commit I'd reviewed without going to at least read the discussion in the MR
09:56 tarceri: daniels: usually people use it to ping the author and reviewer after bisecting a regression. It's not always easy to fine a rb in a long MR discussion
09:57 bnieuwenhuizen: tarceri: why ping the reviewer and not just everyone you caqn find on the MR though? :P
09:58 bnieuwenhuizen: (or just the author, which would be what I'd do even with the r-b tags, unless that is a non-regular)
09:58 daniels: yeah, if I couldn't find the reviewer discussing the relevant section I'd probably discuss it with the author since they were the ones who actually wrote it ... ?
09:59 MrCooper: karolherbst: can't you do test issues like https://gitlab.freedesktop.org/mesa/mesa/-/issues/3306 in your forked project?
10:01 tarceri: bnieuwenhuizen, daniels: people are go on holidays, don't check things they are tagged in, move on from the project, etc All I know it people often tag me for something I have reviewed and I've help fix the problem in the past. I have no problem with that
10:02 daniels: sure; my experience has just been that if you step back and look at things from a clean slate, R-b/A-b tags inside a git commit message isn't really where I'd arrive at
10:02 daniels: it's pretty easy to scroll through and see who's been active in discussions, for driver things you can find out who to ping, etc
10:03 airlied: danvet: oh didnt tip build, will backmerge rc6 tmrw
10:03 airlied: thz for fix
10:07 danvet: airlied, once you backmerged just delete the drm-next fixup to make rebuild-tip work
10:07 danvet: I hope at least that's all it should take
10:16 karolherbst: MrCooper: ohh, yeah, that would have been a better way of testing stuff.. sorry about the noise
10:23 zmike: daniels: good to know, thanks!
12:49 tzimmermann: danvet, in atomic modesetting, cursors are just planes without special corner cases, right?
12:52 danvet: yeah it's purely a pointer for legacy usage
12:52 danvet: it might have special limitations, but so could any other plane
12:55 tzimmermann: danvet, i might need to temporarily disable an active cursor plane during a mode switch. is there a helper or a state flag?
12:56 danvet: there's helpers to shut down all planes
12:56 danvet: and then re-enable them
12:56 danvet: for use in crtc_disable/enable
12:56 danvet: probably what you want
12:56 tzimmermann: what's the name?
12:57 danvet: drm_atomic_helper_disable_planes_on_crtc
12:57 tzimmermann: thanks, i'll have a look
12:58 danvet: see also the kerneldoc for drm_atomic_helper_commit_planes
12:58 danvet: that explains how to use it all
12:58 danvet: maybe should add a link to the kerneldoc of the first function to point at this
13:02 tzimmermann: danvet, some context: ast appears to lock-up occationally during modeswitches since it has been converted to atomic modesetting. i cannot reproduce it if the HW cursor is disabled, so i suspect that the HW cursor has to be disabled while the modeswitch happens.
13:02 tzimmermann: it worked in legacy modesetting
13:03 tzimmermann: and with software cursors
13:12 danvet: tzimmermann, did the legacy code disable the cursor unconditionally?
13:12 danvet: iirc would need to be an explicit call somewhere
13:13 agd5f: hanetzer, sounds like you may be hitting some other issue. Can you open a ticket?
13:17 tzimmermann: danvet, i don't think so. at least i cannot see such a call in the old code
13:19 ajax: tzimmermann: istr a few other chips that require the cursor plane be disabled when the plane under it is (presumably something like: if the fifo from the underlay plane is drained and the cursor plane is trying to blend from it to scan out, you stall forever)
13:19 ajax: so that sounds like an entirely plausible theory
13:23 tzimmermann: interesting
13:23 alyssa: I see a big shiny Approve button show up on gl... does that mean Marge can do automatic r-b now?
13:23 tzimmermann: yet it worked in the old code
13:24 tzimmermann: i'll do some more testing
13:31 bnieuwenhuizen: alyssa: IIRC we disabled application of automatic r-b in Marge because it ended up stripping r-b tags if there was no approval. So even if the answer is "it can" I suspect it is not configured that way for now
13:34 karolherbst: alyssa, bnieuwenhuizen: I guess it also won't really work for bigger MRs where you touch a lot of different code and people would prefer to add tags only to certain commits
13:55 alyssa: bbrezillon: no fun :(
13:55 alyssa: thanks tho
13:55 hanetzer: agd5f: sure.
13:55 bbrezillon: alyssa: ?
13:55 hanetzer: agd5f: I assume you mean on freedesktop? where do you think it should go?
13:57 alyssa: bbrezillon: sorry bnieuwenhuizen tab complete
13:57 agd5f: hanetzer, https://gitlab.freedesktop.org/drm/amd/-/issues
13:58 hanetzer: geh. the gitlab never loads for me :>
14:04 kisak: gitlab.fd.o is working here
14:05 hanetzer: times out for me, which honestly is bullshit lol. 100mbit up/down fiber, no issues on any other site lol.
14:08 bnieuwenhuizen: hanetzer: what does traceroute say?
14:10 hanetzer: says slow af, 11 hops so far.
14:11 hanetzer: 15 hops with ips/domains, then 15 of '* * *'
14:43 alyssa: seanpaul: Looking into AFBC upstream stuff.. I see you merged https://gitlab.freedesktop.org/drm-hwcomposer/drm-hwcomposer/-/merge_requests/44 so maybe you know more than I do here
14:44 alyssa: In particular trying to figure out what the magic modifiers are for the RK3399 big vop's afbc
14:45 alyssa: Mainline set it to 16x16+SPARSE, but I'm not convinced that's entirely right - at least, it's totally broken with Panfrost
14:45 alyssa: If from mesa I override to force YTR on, it gets a lot closer to working (that is, it renders correctly - still have a host of Panfrost bugs to work through)
14:45 alyssa: Indeed, that patch implies that YTR is always used for RGB on Android (and perhaps ChromeOS?) hence the mix-up.
14:46 alyssa: However, I'm not entirely sure the GPU bit in question *is* actually YTR (as opposed to other possible AFBC flags), so I'm hesitant to submit a kernel patch which may prove to be totally wrong
14:47 alyssa: Also, that drm-hwcomposer patch implies that SPARSE and SPLIT are seen together on Android at least. It looks like rk3399 has SPLIT support that's not piped into mainline, and I don't know how to generate that from the GPU anyway. SPARSE alone is what the DDK uses for FBOs anyway.
14:48 robclark: airlied: I suppose you could disable CONFIG_DEV_COREDUMP and that would go away..
14:49 daniels: hanetzer: weird, it's been completely responsive all day and no-one else has reported any issues at all :\
15:02 bbrezillon: jekstrand, karolherbst: I have a problem with how vtn_local_store() deals with 'vectors treated as arrays' (https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/compiler/spirv/vtn_variables.c#L759)
15:02 hanetzer: daniels: perhaps there's something screwy
15:04 bbrezillon: jekstrand, karolherbst: this approach is broken if we have 2 threads, each thread accessing a different half of a vec2
15:04 daniels: hanetzer: i mean yeah, it sounds like there is :P
15:04 bbrezillon: because this RWM approach is not atomic
15:05 bbrezillon: so I was wondering what was the reason for not allowing this vecX as array[X] cast here
15:05 bbrezillon: (seems to work fine for us, but I suspect it's problematic on !CL shaders)
15:10 karolherbst: bbrezillon: alignment
15:11 karolherbst: vecs are vec aligned, arrays are member aligned
15:11 karolherbst: so if you cast from array to vec you might violate alignment rules
15:11 karolherbst: uhm.. base_type aligned not member
15:12 karolherbst: soo...
15:12 karolherbst: mhh
15:12 karolherbst: I have no idea how to even make it atomic
15:12 karolherbst: as some hw have to split the load/stores
15:13 karolherbst: anyway.. does CL promises vecs to be written atomically?
15:13 karolherbst: I am sure it doesn't.. but.. vloadn/vstoren are no atomc operations anyway
15:14 bbrezillon: well, the vstore_local test in the CTS seems to depend on that :P
15:14 karolherbst: then I say the test is wrong :p
15:15 bbrezillon: I'm not so sure, I mean the threads do not access the same portion of the memory
15:15 karolherbst: bbrezillon: if you give me a cl file, I can see what nvidia does
15:15 bbrezillon: and it's really on the ssbo store that we have a problem
15:15 karolherbst: ahhh
15:15 jenatali: SPIR-V has alignment annotations on the loads/stores
15:15 jenatali: The alignment annotation would correspond to the vector member alignment, not the overall vector alignment
15:15 karolherbst: jenatali: irrelevant with vloadn/vstoren
15:16 karolherbst: you do vector operations on _array_ alligned memory
15:16 karolherbst: not vector alligned one
15:16 bbrezillon: karolherbst: https://gist.github.com/bbrezillon/3df316a0ab77749e755b107d530a80c1
15:16 jenatali: Right, I know that
15:16 karolherbst: otherwise it wouldn't work on vec3 at all
15:16 jenatali: Oh, forgot we're not using libclc for this, we have custom vloadn/vstoren implementations
15:16 karolherbst: yes
15:17 karolherbst: and we used to just cast from array to vec
15:17 karolherbst: but that causes alignment problems
15:17 bbrezillon: yep, and the vstore() part is good
15:17 karolherbst: right.. it's just a vloadn issue :)
15:17 karolherbst: ohhh wait
15:17 karolherbst: no
15:17 karolherbst: well.. anyway
15:17 bbrezillon: not even vloadn()
15:18 jenatali: No, vstoren does cast from vec to array it looks like
15:19 karolherbst: as long as you cast to an array it's fine.. yes :)
15:19 bbrezillon: karolherbst: just thinking about it, the cast should be safe as long as you're casting to a sub-type, right?
15:20 Shibe: Hi, I was wondering why VK_google_display_timing isn't in mesa? I found a branch keith packard had with support for it in radv, but it didnt seem to get merged
15:20 bbrezillon: I mean, the alignment constraint of uint2 is stronger than uint[2], so casting from uint2 to uint[2] should be okay
15:20 karolherbst: bbrezillon: right.. so nvdia splits those
15:21 ajax: Shibe: you asked the question, then gave the answer ;)
15:21 karolherbst: bbrezillon: yep, that's what I meant with casting to an array
15:21 Shibe: ajax: yeah, but I couldn't seem to dig up why it wasn't merged
15:21 bbrezillon: so we should be able to relax vtn_local_store(), right?
15:21 karolherbst: bbrezillon: I think so.. yes
15:22 karolherbst: nvidia does split it up for odd reasons
15:22 bbrezillon: actually, I'm not even sure loading a vec in vtn_local_load() is needed
15:22 bbrezillon: but at least that one is not buggy
15:22 ajax: Shibe: insufficient thrust applied, plus that branch predates the gitlab migration and probably got forgotten in the shuffle
15:22 Shibe: ah
15:22 karolherbst: bbrezillon: ohh wait.. actually
15:22 karolherbst: nvidia does split it
15:22 karolherbst: now.. let me think
15:23 jenatali: bbrezillon: Should be safe to cast as long as the incoming alignment is <= element alignment, yeah?
15:23 jenatali: Eh, nah even that, I think we just have to cast it
15:23 karolherbst: bbrezillon: so uhm... the issue is, that the _target_ is still only single component aligned
15:23 bbrezillon: jenatali: isn't it always the case?
15:24 karolherbst: so you do a 64 bit load, but 2 32 bit stores
15:24 jenatali: karolherbst: If you want to modify a single component of a vector, regardless of whether it's 8bit components, you have to store in place by casting, you can't do a read-modify-write without it being atomic
15:24 bbrezillon: karolherbst: do you have an example
15:25 karolherbst: bbrezillon: the one you gave me
15:25 bbrezillon: we don't do 64 bit loads AFAICT
15:25 karolherbst: right..
15:25 bbrezillon: jenatali: yep, atomic RWM only work on scalar
15:26 karolherbst: bbrezillon: you can do an atomic write of a vec4 though :p
15:26 karolherbst: maybe not you, but some hw can
15:26 jenatali: karolherbst: But we're not talking about an atomic write of a vec4. We're talking about a write of a component of a vector
15:26 bbrezillon: well, not sure that in the SPIRV spec
15:26 bbrezillon: last time I looked all atomics were on scalar
15:26 karolherbst: jenatali: sure.. the write itself is atomic
15:27 jenatali: karolherbst: But the RMW isn't, which means it's not a valid implementation of storing to a single component of the vector
15:27 bbrezillon: just want to remind you that the test itself only expects the 32b writes to be atomic
15:27 bbrezillon: no atomic vec2 store expected here
15:27 karolherbst: jenatali: huh? what do you mean exactly is broken?
15:27 bbrezillon: it's VTN that decides to turn a uint store into a vec2 RWM
15:28 karolherbst: bbrezillon: still? I thought that got fixed
15:28 bbrezillon: hm, I'm not on master
15:28 bbrezillon: let me check
15:28 jenatali: bbrezillon: You linked master above: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/compiler/spirv/vtn_variables.c#L759
15:28 bbrezillon: jenatali: right :)
15:29 bbrezillon: so I confirm it's not fixed
15:29 bbrezillon: at least not this case
15:29 karolherbst: bbrezillon: again "what is broken"?
15:29 karolherbst: I don't think those function guarantee atomic operation
15:29 bbrezillon: let me paste the NIR output
15:29 bbrezillon: (the result of spirv_to_nir())
15:29 karolherbst: so if we do per component writes that's totally legit
15:30 bbrezillon: SPIRV does
15:30 bbrezillon: but VTN doesn't
15:30 jenatali: karolherbst: If I have a uint2 in memory and write to one of the components, it shouldn't modify the other one
15:30 karolherbst: jenatali: that's not what vstoren/vloadn are about though
15:30 karolherbst: you operate on an array
15:30 jenatali: karolherbst: This isn't vstoren
15:30 karolherbst: and extract or store a vec
15:30 jenatali: It's just the implementation of store on a component of a vector
15:31 jenatali: The fact that it hits in a vstoren test is coincidental :)
15:31 karolherbst: we shouldn't touch the other component?
15:31 jenatali: karolherbst: We do a RMW
15:32 bbrezillon: https://gist.github.com/bbrezillon/6b8b4f54e2b29f9eb76c1775859f136b
15:32 karolherbst: jenatali: ... so you try to tell me, that if I have a vec2 n, and I do n.a = some_val; that causes a RMW?
15:32 jenatali: Yes, that's the code bbrezillon linked
15:32 jenatali: And I re-linked ;)
15:32 karolherbst: where does the code do that?
15:32 karolherbst: I don't see a single component vector write
15:33 jenatali: get_deref_tail checks for whether the SPIR-V deref chain is trying to access a single vector component as an array
15:33 jenatali: If so, then the local_store does a RMW
15:33 karolherbst: I am still at the CLC code though
15:34 jenatali: Right, the bug's just in vtn. The CLC just does a single-component write
15:34 karolherbst: or rather, I am trying to figure out what in the CLC goes wrong
15:34 karolherbst: okay
15:34 jenatali: The SPIR-V generates a single store with an access chain to the single component
15:36 karolherbst: mhhh
15:36 karolherbst: okay.. which nir operation do you think is incorrect?
15:36 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/compiler/spirv/vtn_variables.c#L759
15:36 karolherbst: no.. I meant in the printed nir, because I still don't see anything wrong there
15:37 jenatali: vtn_local_store when the access chain points a vector component, vtn does a RMW on the whole vector, which can undo external changes to non-modified vector components, since it's not an atomic RMW
15:37 karolherbst: I understand that you see this as an potential issue, I just don't see this happening
15:39 karolherbst: but yeah... that code does indeed load the full vector
15:39 karolherbst: and writes to a component
15:39 karolherbst: mhh, odd
15:39 bbrezillon: https://gist.github.com/bbrezillon/6b8b4f54e2b29f9eb76c1775859f136b#file-gistfile1-txt-L202-L211
15:39 jenatali: Thanks, was having trouble actually spotting the resulting nir of the RMW :P
15:40 karolherbst: uhh
15:40 karolherbst: yeah.. the code looks bogus
15:41 karolherbst: ehh.. loop unrolling :/ annoying
15:42 karolherbst: bbrezillon: mind disabling loop unrolling?
15:42 karolherbst: might be easier to read the code then
15:48 karolherbst: jenatali: ohhh.. I think I know what's up there
15:48 karolherbst: but still
15:48 karolherbst: I think this is just some optimizes trying to optimize the loop
15:49 karolherbst: soo.. I think it kind of tries to write both components of dp at the same time
15:49 karolherbst: but then there is still this single component write later on
15:49 karolherbst: ...
15:49 jenatali: It still doesn't matter what the CLC -> SPIR-V path does
15:49 karolherbst: something funky is happening there
15:49 jenatali: The SPIR-V does a single component write
15:49 karolherbst: ohhh.. llvm optimized the loop away
15:49 karolherbst: not nir
15:50 jenatali: Yeah, LLVM loves to do that :)
15:50 karolherbst: ehh...
15:50 karolherbst: wait...
15:50 karolherbst: so..
15:50 karolherbst: after the barrier we only have the write inside the loop, correct?
15:50 karolherbst: amd the loop does two iterations?
15:51 jenatali: Yeah, that sounds right
15:51 karolherbst: but yeah.. that at least seems fine in the spir-v as well
15:51 karolherbst: two stores, both single component
15:52 karolherbst: well.. those are pointers to uint anyway
15:52 bbrezillon: karolherbst: FWIW, this https://gist.github.com/bbrezillon/a9668aa3c062db3ec4a4a7638f5c4ca4 fixes the problem :)
15:52 karolherbst: right...
15:52 karolherbst: _but_
15:52 karolherbst: why is it a vec2 in the first place?
15:52 bbrezillon: it's not a vec2
15:53 bbrezillon: well, the base type is a vec2
15:53 jenatali: Probably LLVM removed the casts and just replaced the array indexing with vector component indexing
15:53 karolherbst: bbrezillon: it's not
15:53 karolherbst: CLC and spir-v do 2 uint stores
15:53 karolherbst: nir turned the first one into vec2, the second one is still uint
15:53 bbrezillon: the base type, as in, the arg passed to the kernel
15:53 bbrezillon: which is then cast to a uint array
15:53 karolherbst: bbrezillon: "dp[i] = sp[i];" is the code
15:54 karolherbst: ohhhh wait...
15:54 karolherbst: right
15:54 karolherbst: so.. we have this ugly cast...
15:54 karolherbst: mhhhh
15:54 karolherbst: mhhhh
15:54 karolherbst: crap
15:54 karolherbst: jekstrand: we have a bug in the cast optimization code
15:54 karolherbst: I think
15:54 bbrezillon: the cast is here https://gist.github.com/bbrezillon/6b8b4f54e2b29f9eb76c1775859f136b#file-gistfile1-txt-L14
15:55 karolherbst: right
15:55 karolherbst: but we shouldn't turn that into a vec2 store
15:55 karolherbst: no matter what
15:55 bbrezillon: that's done in vtn_local_store() :P
15:55 karolherbst: yeah.. I figured...
15:55 karolherbst: mhhh
15:55 karolherbst: but yeah.. I think I understand the issue now
15:57 karolherbst: jekstrand probably has a good idea on why that code is there
15:57 karolherbst: bbrezillon: mind printing the nir with gdb when you are inside the if?
15:57 karolherbst: but mhhh
15:57 karolherbst: something is odd
15:58 karolherbst: sadly I won't have time to debug this myself on what's actually going on there, I just understand the issue now :)
15:59 karolherbst: bbrezillon: I am actually more interested in what get_deref_tail returns and why
15:59 karolherbst: or maybe it just doesn't support casts yet
15:59 karolherbst: something like that
16:00 karolherbst: bbrezillon: yeah... I'd say that if the deref is a cast it should return deref as well, just like for arrays
16:00 karolherbst: bbrezillon: mind trying if that solves the issue as well?
16:01 jenatali: Oh, I think you're right, I think I misinterpreted what was going on and thought it was vector component indexing, but it's just also broken for casts
16:01 karolherbst: yeah...
16:01 karolherbst: I blame get_deref_tail :p
16:05 bbrezillon: karolherbst: sure, I can try that
16:05 jenatali: bbrezillon: I think I know why the RMW is there instead of just using your patch: https://gitlab.freedesktop.org/kusma/mesa/-/commit/07516dd5c86deb401f4e3a4221dae2f9e82d6f7b
16:05 jenatali: bbrezillon: Your patch will break for Vulkan shaders that try to do stores of a single component on a function_temp
16:05 karolherbst: bbrezillon: I just don't know how the fix should look like, would need to debug a bit and see what the starting deref is :)
16:06 karolherbst: but I assume it's the store_deref and the parent is the cast
16:06 karolherbst: _but_ it could be something else
16:06 karolherbst: anyway, whatever the cast is, I think we need to handle it there
16:06 bbrezillon: jenatali: well, in that specific case it's a deref on an ssbo
16:06 jenatali: I think for kernels, we can just let the deref chain access the single component, and lower_explicit_io will patch it all up for us, but for Vk shaders we either need a RMW or a cast to an array instead of a vector
16:07 bbrezillon: so it would still happen without your patch, right?
16:07 jenatali: The patch I linked allows vector indexing for all memory types, but only in kernels - vector component indexing for function_temp is still disallowed for Vk shaders
16:08 jenatali: (And I haven't even floated that patch by upstream folks I think :P)
16:08 bbrezillon: yes, but my point is, the code I pasted does a store on an SSBO
16:08 bbrezillon: so the problem exists for SSBOs already
16:09 bbrezillon: (I do understand that my patch is breaking Vk, BTM)
16:09 bbrezillon: *BTW
16:09 jenatali: Oh, sure
16:11 bbrezillon: so the proper fix would probably be to extract this condition in a helper and use in get_deref_tail() to select whether to return the parent or the leaf deref
16:12 jenatali: Yeah I'd buy that
16:17 karolherbst: bbrezillon: yeah.. sounds about right
16:27 bbrezillon: karolherbst, jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/240/diffs?commit_id=1b03bb544491605c621e754f98c14828f5468de5
16:29 jenatali: bbrezillon: Yeah, looks reasonable to me. Probably want to share that list of modes with that assert so we don't have 2 places to change
16:29 karolherbst: bbrezillon: wouldn't it be easier to just check for casts? or is there another reason why this needs to be checked like this?
16:30 jenatali: karolherbst: You can array-index into a vector as well
16:30 karolherbst: sure
16:32 karolherbst: but why are those mem modes added?
16:32 bbrezillon: jenatali: well, the piece of code is in nir_validation.c
16:32 jekstrand: bbrezillon: That should only be used for local (shader or function temporary) memory.
16:33 jekstrand: Sorry, lots of backlog.
16:33 bbrezillon: jenatali: not sure there's much value in exposing a helper
16:34 jenatali: jekstrand: It's also used for shared it looks like
16:34 jenatali: Maybe that's the whole bug?
16:35 karolherbst: jenatali: well.. it's used on global, no?
16:35 jenatali: Oh, and the vload/vstore call them directly rather than going through _vtn_variable_load_store.....
16:35 jenatali: Where there's a nice comment about the RMW race
16:36 jekstrand: jenatali: That sounds like a bug
16:36 jekstrand: One day (and that day may be coming very soon), we may want to make array derefs of vectors work for local variables in NIR.
16:36 jekstrand: They don't today.
16:36 jenatali: jekstrand: Yup. I think there's two bugs - vload/vstore use the wrong load/store helpers, and vtn_pointer_is_external_block doesn't have shared
16:37 jenatali: jekstrand: Works if you convert them to scratch via explicit io ;)
16:37 jekstrand: jenatali: It does but vars_to_ssa will die in a fire.
16:37 jenatali: Ah. Not familiar with that one yet
16:37 jekstrand: I think vars_to_ssa is likely the only thing left that will die in a fire though.
16:37 jekstrand: Maybe if I just fixed it.....
16:37 bbrezillon: jenatali: but that piece of NIR doesn't come from the vstore() lowering
16:38 jekstrand: jenatali: It's possibly the most important pass in NIR. :) Maybe not top but top 5.
16:38 bbrezillon: and the problem really is on global memory in that case
16:39 jenatali: Oh, yeah, vtn_pointer_is_external_block is also missing global, look at that
16:39 jekstrand: Eesh. Yeah, global being missing is a problem.
16:39 jekstrand: I wonder how hard vars_to_ssa would be to fix.
16:39 jekstrand: I've glanced at it before, thrown up my hands, and walked away.
16:39 jekstrand: Maybe it's time to try again.
16:40 jekstrand: I've been doing lots of break-the-universe NIR reworks lately. What's another one, right?
16:40 jekstrand: cwabbott: You're probably on deck to review this one. :-P
16:41 karolherbst:wonders if supporting unstructured CF in all nir supported more of an annouence or someting people like :p
16:42 karolherbst: ohh btw.. I am quite close of being done with the vtn changes
16:42 karolherbst: just need time to do the final rewrite
16:43 jekstrand: cool
16:43 jekstrand: karolherbst: I may actually be motivated to review it for real now. :-)
16:43 karolherbst: :D cool
16:44 karolherbst: I am still waiting for the day where I don't have random stuff coming up and I can focus on rewriting that stuff
16:44 MrCooper: bnieuwenhuizen: is the meson-clang job going to pass for https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5600 this time? :)
16:45 bnieuwenhuizen: MrCooper: hey, I've been fixing actual intrinsics that clang fails on but gcc doesn't ...
16:45 jekstrand: karolherbst: Those days don't come on their own. You take them by force.
16:45 karolherbst: jekstrand: yeah.. I know :/
16:45 MrCooper: bnieuwenhuizen: I mean, have you verified locally that it works now?
16:46 karolherbst: but the weather is also annoying and it just gets to warm. so my concentration is pretty much done for these days anyway
16:46 bnieuwenhuizen: MrCooper: not with clang
16:47 jekstrand: karolherbst: Tell me about it. It's been getting > 35C/95F every day here for the last few weeks. :-(
16:48 karolherbst: jekstrand: I'd die
16:48 jekstrand: The A/C keeps up most of the time but not in the room with all the computers. :-(
16:48 bnieuwenhuizen: sigh, another failure, tmie to figure out how to build with clang ...
16:48 MrCooper: bnieuwenhuizen: CI is no substitute for local testing... at the least, you can manually trigger only the needed jobs until it's fixed, then assign to Marge after that
16:48 karolherbst: jekstrand: I guess you also have this deadly 35C + 80% humidity combo :p
16:49 jekstrand: karolherbst: Not here. We're far enough in-land that we don't hit 80%. It tends to max out at around 60% here which still isn't great when it's 35-40C.
16:49 bnieuwenhuizen: MrCooper: how is it not a substitute for configs I don't have locally?
16:49 karolherbst: right...
16:49 jekstrand: I did spend a summer in Houston and that was 40C + 90-100% humidity all summer long. It was torture.
16:50 karolherbst: :/
16:50 karolherbst: yeah.. that sounds painful
16:50 TheRealJohnGalt: Welcome to South Florida 🤣
16:50 jekstrand: You learn to deal with it. Mostly by staying in-doors with the A/C blasting.
16:50 jekstrand: TheRealJohnGalt: I imagine that Houston and south FL aren't that different, climate-wise.
16:51 jekstrand: About the same latitude and both right on the ocean.
16:51 TheRealJohnGalt: Yeah, I really don't think they are. We even get hit by many of the same tropical depressions.
16:51 jekstrand: The other advantage of being 300 miles in-land: No hurricanes. :)
16:51 TheRealJohnGalt: Only difference *may* be UV index. Otherwise should be the same.
16:52 TheRealJohnGalt: Talk to a floridian anywhere, they'll say "the hurricanes never hit us" lol.
16:52 jekstrand: We get the tail end of them but by that point, it's just lots of rain.
16:53 TheRealJohnGalt: yeah, that's not bad at all.
16:53 MrCooper: bnieuwenhuizen: there can be cases like that, but clang is hardly one
16:54 jekstrand: karolherbst, jenatali: I just threw a patch to delete all the insert/extract stuff from vtn_load/store_local. Once I see where it blows up, I'll start fixing things and see what it'll take to get rid of it.
16:55 jenatali: jekstrand: Cool, sounds good
16:55 jenatali: jekstrand: I'm trying out your suggestion of tweaking the libclc mangler to operate on vtn types instead of glsl types so we don't need a glsl type for events
16:55 jenatali: I'll let you know how it goes
16:56 jekstrand: jenatali: As I said, I *think* we should be mostly ok except for vars_to_ssa but we'll see.
16:56 jekstrand: jenatali: Cool. SPIR-V -> NIR has become more and more of its own IR over time. It used to attempt a pretty direct translation but at this point it's got its own concept of SSA values, types, a control-flow graph and basically everything else except actual instructions.
16:58 karolherbst: jekstrand: fun fact.. pmoreau started OpenCL support for nouveau by translating spir-v to codegen directly and when I picked it up I was like: nope.... not with me :D
16:58 jekstrand: karolherbst: Yeah, I remember that, vaguely.
16:58 jekstrand: karolherbst: IIRC, I thought he was crazy
17:00 karolherbst: at some point I still had to defend on why using nir is the better option, but pmoreau was fine with the new path quite quickly... uff
17:00 karolherbst:wants to get rid of all TGSI in novueau anyway
17:00 jekstrand: Die, TGSI, die!
17:00 jekstrand: I don't think anyone actually *likes* TGSI
17:01 jekstrand: Other than maybe the Nine people
17:01 imirkin:is a big fan
17:01 karolherbst: :D
17:01 jekstrand: and imirkin :P
17:01 karolherbst: at least now we get real testing on our nir path at least
17:01 karolherbst: so.. I guess the it won't be considered untested in a year anymore
17:05 imirkin: i like it coz it's simple and good at what it does -- communicating an IR from one place to another.
17:09 jenatali: jekstrand: Ugh, LLVM is bad at keeping signedness, which makes it difficult to mangle names correctly. There's some hacks right now to remap types for certain known functions to fix the signedness. If I want that to work using vtn types, I need to find or fabricate a vtn type for int or int vectors
17:09 jenatali: Any existing helpers for that? Or is that just a terrible idea?
17:12 jekstrand: Why do you need to fabricate types?
17:12 jenatali: For clarity, CLC calls a function with int. LLVM loses the sign, but mangles both caller and callee with int. SPIR-V has uint, and converts the mangled call site to an opcode. We're converting the opcode back to a mangled name, which means we need int instead of uint for mangling purposes
17:12 jekstrand: I mean, it's possible
17:12 jekstrand: jenatali: Is it consistent in that it always stomps everything to signed?
17:13 jenatali: It stomps to unsigned, yeah
17:13 jekstrand: Wait, I'm confused.
17:13 jekstrand: SPIR-V has uint but it's mangled as signed int?
17:13 jekstrand: Is it just that LLVM doesn't make a distinction when mangling?
17:13 jenatali: Yeah. LLVM mangles the function names based on the types in the source, which is signed int
17:14 jenatali: But LLVM doesn't have different signed/unsigned int representations, just e.g. i32
17:14 jenatali: So SPIR-V's int types are all unsigned
17:15 jekstrand: jenatali: So do you need to create int types or just make the mangler not care?
17:15 jenatali: So e.g. SMad opcodes end up with sources that are of type unsigned int
17:15 jenatali: The mangled name needs to have signed int ('j') in order to find it correctly in the libclc library
17:16 jekstrand:is very confused
17:16 jekstrand: You can create vtn_types. The only problem is that we have no caching of them today so if you create an int type many times, you'll have many objects.
17:16 jekstrand: Probably not actually a problem
17:18 jenatali: Maybe this helps: CL has a mad() function with signed and unsigned overloads. SPIR-V has different opcodes for signed/unsigned (UMad/SMad). If you look at the arguments passed to either of those, the SPIR-V types are all uint, even for SMad
17:18 jenatali: If I want to call the int overloads of mad(), I need to mangle SMad to have signed types in the name, and UMad to have unsigned types in the name
17:18 jekstrand: Right
17:19 jenatali: If I just naively mangle based on what SPIR-V gives me, I'll only ever call the unsigned versions
17:19 jekstrand: But you're getting unsigned types in the SPIR-V even for signed opcodes and you need to fix them up?
17:19 jenatali: Yep
17:19 jekstrand: Ok, that makes sense.
17:19 jekstrand: And is annoying
17:20 jenatali: Indeed :)
17:20 jenatali: Example: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035/diffs?commit_id=c62e8687b27c7d1bb9b136fee1d328adc9f66755
17:20 jekstrand: Sounds like you just need to write a little vtn_type_to_signed helper which creates a new vtn_type for you.
17:21 jekstrand: We don't currently have a "get me a vtn_type for this glsl_type" helper
17:21 jenatali: Ok, as long as that doesn't sound awful
17:21 jekstrand: At least not that I remember.
17:21 jekstrand: Nah, it's fine.
18:00 jekstrand: jenatali, bbrezillon: I've got a branch with the vector insert/extract stuff and it's passing at least one test which uses it. The vars_to_ssa changes weren't nearly as bad as I though. We'll see what other passes blow up.
18:00 jenatali: :)
18:01 jekstrand: I *think* it should work on inputs/outputs because it has to work for TCS outputs somehow. But I can't find the code for that so I don't know...
18:02 jekstrand: Kayden: Do you remember how that's handled?
18:05 jenatali: Ugh... Thought I could simplify the address space mangling, but that'd require not only synthesizing int types, but also pointer-to-int types
18:13 jekstrand: :(
18:32 jenatali: Ok that's still cleaner than it was though, so I'm happy
18:42 anholt: jekstrand: aww, looks like the current liveness doesn't have the property I expect
18:46 anholt: ... or maybe I fell asleep before finishing the pass and forgot to write the bit using that the next day. oops.
20:03 Kayden: jekstrand: reading back and struggling a bit to see the exact question, but... for TCS vector outputs, I believe multiple threads are allowed to each write a vector component, then barrier, and once everyone passes the barrier, all 4 writes will be visible and nobody trashed each other.
20:05 Kayden: At the very least, I know we saw implementations allow multiple threads to simultaneously write components of a vec4 output and it worked out coherently
20:05 Kayden: and I am pretty sure that there are tests - but I don't recall if they're CTS tests, or piglit tests that we wrote
20:06 imirkin: Kayden: TCS can only write to its own native invocation's outputs
20:06 imirkin: (+ patch vars)
20:06 Kayden: Yeah. So an out patch vec4
20:06 imirkin: however it can *read* other invocations' outputs
20:07 Kayden: pretty sure that 4 TCS invocations are allowed to write out.x, out.y, out.z, out.w, without a barrier, and it works
20:07 imirkin: yes
20:07 imirkin: a random one's writes will end up there though
20:07 Kayden: well no
20:07 imirkin: or you mean invoc 1 writes out.x, invoc 2 writes out.y?
20:07 Kayden: yes
20:07 imirkin: that would be legal as well, afaik
20:08 imirkin: but it's an interesting question. i dunno where the "unit" of atomicity is specified
20:08 imirkin: i.e. what's legal to implement a rmw and what's not
20:08 Kayden: invoc 0 writes x, invoc 1 writes y, invoc 2 writes z, invoc 3 writes w gives you a complete vec4 with defined values and no races
20:08 imirkin: tbh i dunno if that's guaranteed by the spec. that's definititely hw it would work on nvidia hw though
20:09 Kayden: Yeah, I don't remember either
20:09 Kayden: NV definitely worked there, we made it work on Intel too
20:09 imirkin: on nv, the output components are all independent
20:09 Kayden: I forget if there are CTS tests
20:09 imirkin: you'd have to do lots of work to make it fail :)
20:09 Kayden: ahh.
20:09 Kayden: I do wonder if that's guaranteed everywhere for say 16 or 8 bit values
20:10 Kayden: because I could see some HW being able to do atomicity at a 32-bit granularity
20:10 airlied: I remember falling over the float a[4] case which does a[gl_InvocationID]
20:11 airlied: not sure I remember seeing it for a vector
20:11 Kayden: ah yes, "patch out float a[4]" with a[gl_InvocationID] is *definitely* in the tests and falls over.
20:35 anholt: tomeu: how are you debugging the 403 errors for tracie+minio? would love to get freedreno uploading images.
21:19 ajax: why the crap are dri3_bind_context and friends boolean-inverted for return value from eglMakeCurrent and glXMakeCurrent
21:23 jekstrand: ajax: Because we're massochists
21:26 ajax: it's taking a lot of maturity not to rewrite literally everything about the driver binding layer (yet)
21:27 Sachiel: doooooooo iiiiiiiit
21:27 Kayden: "yet"
21:27 Kayden: :)
21:28 jenatali: Hm... if I somehow broke freedreno Vk, and don't have a device to debug... what do?
21:35 anholt: jenatali: link?
21:35 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035
21:37 anholt: jenatali: if you've got a crash, you may have luck with src/freedreno/drm-shim to run the deqp trace. (Would sure be nice if we captured the stderr of a crash in the runner and saved it). but let me take a look.
21:38 jenatali: Thanks. I didn't see any non-CL SPIR-V changes that looked like they could've caused the crash
21:42 anholt: getting a null deref in vtn_handle_builtin_call()
21:42 anholt: on strncmp
21:42 jenatali: Ah... anonymous functions
21:42 jenatali: Thanks!
21:44 anholt: wonder if capturing a core dump from the deqp runner would be of much use.
21:44 anholt: (thinking for the next time an error like this happens)
21:55 jekstrand: I think I wrote a lowering pass once to get rid of array-deref-of-vec
21:55 jekstrand: I wrote it for TCS output
21:55 jekstrand: Now I can't find it. :-(
21:57 jekstrand: No, I seem to have just fixed the GLSL one. :-(
22:32 jekstrand: bah! found it.
22:32 jekstrand: After rewriting it
22:32 jekstrand: :-(
22:32 jenatali: Better late than never?
22:32 jekstrand: I guess