00:04 karolherbst: jekstrand: sure, but mem_constant would be considered constant as well
00:04 karolherbst: there are ust a few cases where it actually makes sense not using ubos which are just super annoying to deal with :/
00:05 karolherbst: airlied: how does l0 deal with all the __constand* madness?
00:07 airlied: karolherbst: the API is pretty much same as CL in that area
00:07 karolherbst: mhhh
00:08 karolherbst: how did l0 do SVM again?
00:08 karolherbst: is that all explicit or can you bind host memory to essentially everything?
00:10 airlied: they have shared allocations that can be migrated
00:10 karolherbst: OpOrdered and OpUnordered are also super useless instructions aren't they?
00:10 karolherbst: ...
00:10 jenatali: Yep
00:10 karolherbst: wondering if there is hw at all which can do it in one go
00:11 karolherbst: mhhhhh
00:11 jenatali: Let's not think about it too much unless someone complains that their hardware isn't getting a native instruction for it
00:12 karolherbst: I am wondering if we can actually do better
00:12 karolherbst: mhhh
00:12 karolherbst: jenatali: fne is fneu by definition
00:12 karolherbst: but that makes it fine I think?
00:13 karolherbst: I should have thought about it earlier
00:13 pendingchaos: karolherbst: v_cmp_u_f32 and v_cmp_o_f32 on AMD hardware
00:13 jenatali: Right
00:13 airlied: karolherbst: but lvl0 has no concept from what I can see of special constant memory
00:13 airlied: everything is just memory
00:14 karolherbst: jenatali: I think right now it's not implementable actually... soo.. fne will be an unordered operation as otherwise you just end up with a broken driver...
00:14 karolherbst: ehhh
00:14 karolherbst: unordered operation are annoying
00:15 jenatali: karolherbst: What's the concern?
00:15 karolherbst: nothing.. just have to remind me again on how that works.. so fneu on unordered sources is true
00:15 karolherbst: so yes.. unordered does the correct thing
00:15 jenatali: Yeah I can never keep it straight
00:16 jenatali: Right now NIR just implements unordered ops, and vtn adds extra ALUs to make things ordered
00:16 jenatali: In theory, DXIL could benefit (as could AMD it sounds like) from having separate ops for ordered
00:17 jenatali: Sounds like a good thing for the backlog though, probably not a huge improvement in practice
00:17 karolherbst: nv can as well :p
00:17 karolherbst: I had patches and it isn't even that annoying
00:17 karolherbst: oh well...
00:17 karolherbst: uhm..
00:17 karolherbst: I meant unordered comparisons
00:17 karolherbst: but that's another story
00:18 karolherbst: just marge it :p
00:18 jenatali: I still can't ;)
00:19 airlied: I think we should fix that
00:20 airlied: jenatali: file an issue asking for commit access please
00:20 jenatali: airlied: Can do
00:21 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3433 - not sure if there's a particular format I should've followed
00:22 karolherbst: mhhh.. what should I upstream next.. mhh
00:22 airlied: jenatali: switched you to developer
00:22 jenatali: Thanks! :D
00:23 airlied: I think the rules you already know, but don't merge direct, use marge, get reviews
00:23 jenatali:nods
00:23 karolherbst: 64 bit atomics would be fun
00:24 FireBurn|Home: Would they be fun though?
00:24 karolherbst: yep
00:24 karolherbst: I already wrote the lowering pass to CAS
00:24 karolherbst: so.. really not that much
00:25 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/bc3f8f85ce804db934988242a9a1ebd43a3a70fc
00:29 airlied: karolherbst: is the trello still maintained?
00:29 karolherbst: ehhh...
00:29 karolherbst: not by me
00:30 airlied: I suppsoe just pulling all the opencl and clover tagged MR might be enough for now
00:30 karolherbst: yeah
00:30 jenatali: airlied: Not sure what you're tracking, but you might also be interested in the CL items under https://gitlab.freedesktop.org/mesa/mesa/-/issues/3047
00:31 airlied: jenatali: just a what needs to land for 1.2 kinda thing
00:31 jenatali: airlied: Then that list should be tracking what's missing from vtn/nir at least
00:32 karolherbst: airlied: I think the most annoying bit is fixing the API
00:32 karolherbst: there are quite some annoying issues
00:32 karolherbst: like the compilation callback stuff
00:33 FireBurn|Home: Are there any efforts on going to get OpenCL in better shape in Mesa?
00:33 airlied: FireBurn|Home: all of this discussion is that -P
00:33 FireBurn|Home: :D
00:33 FireBurn: Hussah
00:34 airlied: karolherbst: for proper fixes for that sort of thing, do we need gallium changes?
00:34 jenatali: Still feels a bit odd that this is happening in #dri-devel, considering how little we're talking about dri
00:34 karolherbst: for the API? yes
00:34 karolherbst: quite a lot actually
00:34 karolherbst: we have to rework how shader compilations are done
00:34 Sachiel: jenatali: drivel related indirectly
00:34 airlied: karolherbst: okay this is the compiler one kernel vs all the kernels?
00:34 karolherbst: nope
00:35 karolherbst: compile before launching
00:35 karolherbst: to the final binary
00:35 karolherbst: like final final
00:35 airlied: oh so no compiling at dispatch time
00:35 karolherbst: right
00:35 karolherbst: this has.. implications
00:35 karolherbst: at clkernel creation time you don't have a queue
00:35 karolherbst: so you also have no context
00:36 karolherbst: so we either have a compilation only context and share objects with all
00:36 karolherbst: or... we compile on a screen
00:36 karolherbst: anyway... annoying
00:36 airlied: I think a compile only ctx is fine
00:36 karolherbst: right
00:36 karolherbst: then you have drivers which can't share the state objects
00:36 karolherbst: like r600 and radeonsi I think
00:37 airlied: PIPE_CAP_SHAREABLE_SHADERS for veryone
00:37 karolherbst: yep
00:37 karolherbst: but we also need a bit better apis
00:37 karolherbst: like the driver needs to tell us the max threads per compiled kernel
00:37 karolherbst: and so on
00:38 airlied: ah so we need info retrieveal apis for shaders?
00:38 karolherbst: yep
00:38 karolherbst: right now we more or less hardcode the max threads of a kernel
00:38 karolherbst: but we can't do that
00:38 karolherbst: as.. more regs means less threads
00:38 karolherbst: so this will be a bit of work already
00:39 karolherbst: airlied: this is for clGetKernelWorkGroupInfo
00:39 karolherbst: but I suspect big kernels using a lot of regs won't work in nouveau
00:39 curro_: both options you guys are talking about sound reasonable. another possibility would be to delay compilation until a real context is present and a kernel is launched, then cache the result in the program object for other kernel entry points
00:39 karolherbst: and r600/radeonsi assume worst case and don't dispatch many threads
00:39 karolherbst: it's painful
00:39 karolherbst: so either you risk running into errors or your perf sucks
00:40 karolherbst: curro_: can't do that
00:40 karolherbst: we need the result for clGetKernelWorkGroupInfo
00:40 jenatali: karolherbst: Make it up?
00:40 karolherbst: or at least I think it was this API...
00:40 karolherbst: yeah.. "CL_​KERNEL_​WORK_​GROUP_​SIZE"
00:41 karolherbst: you can only know this when you have the final binary
00:41 jenatali: karolherbst: That API takes a device - if you haven't launched that kernel on that device, compile it on-demand
00:41 karolherbst: sure
00:41 karolherbst: but you still have no context :)
00:41 jenatali: Ah, sure
00:41 karolherbst: clover creates contexts per queue
00:42 karolherbst: so if you have a compiler only context, you have to be able to share the state between contexts
00:42 jenatali: FWIW, I ended up creating a global context per device, and having each queue drain into a software-only pool that's processed on a per-device thread
00:42 airlied: or give the same serialized nir to the second context
00:42 airlied: and hope the driver does the msae thing with it :-P
00:42 karolherbst: airlied: ehhh...
00:42 karolherbst: right.. best case it's cached anyway
00:43 karolherbst: but there are also fun bits like CL_​KERNEL_​LOCAL_​MEM_​SIZE and CL_​KERNEL_​PRIVATE_​MEM_​SIZE
00:43 karolherbst: so we need to retrieve some more info :)
00:43 karolherbst: but yeah
00:43 karolherbst: clGetKernelWorkGroupInfo operates on the final result
00:45 karolherbst: airlied: anyway.. I'd would like to get test_basic to work first and then we can split up the remaining test classes and fix them one by one or so
00:46 curro_: karolherbst: ah, right, yeah that sucks, means you really need a per-device compilation context (for NIR devices at least, LLVM back-end ones shouldn't care) *unless* you're willing to return a conservative value for that query, which sure might be mildly suboptimal but technically compliant
00:46 karolherbst: api isn't _that_ terrible and we have a a CL 1.2 MR for the api bits already
00:46 karolherbst: curro_: r600 and radeonsi do return conservative values today afaik
00:46 karolherbst: but yeah...
00:46 karolherbst: it's annoying
00:47 karolherbst: but clover also needs to retrieve back results on the compilation anyway
00:47 karolherbst: which it also doesn't do at this point
00:47 karolherbst: mainly because there is no API for that :)
00:48 karolherbst: curro_: what I think is the best solution is, that we simple require drivers to be able to serialize their binaries
00:48 jenatali: karolherbst: FYI for test_api, I think I already linked you a translator issue that'll probably trip you up
00:48 karolherbst: this would also allow for a proper implementaiton of the shader binary API
00:49 curro_: karolherbst: yeah, that sounds fine to me
00:49 airlied: most drivers should do that already for GL binary api
00:49 karolherbst: airlied: nope, we return nir/tgsi for that afaik
00:49 karolherbst: never the driver binary
00:49 karolherbst: there is no API for that anyway
00:50 karolherbst: we'd need to add a DRIVER_BINARY binary type or something to deal with it
00:50 karolherbst: but I think that's probably the best solution
00:50 karolherbst: forces drivers to implement shader caching :p
00:50 karolherbst: so win-win
00:51 curro_: or it relieves them from implementing shader caching if the state tracker can just query the binary and cache it for them?
00:51 jenatali: ^^
00:51 karolherbst: or that
00:51 karolherbst: but some drivers also generate variants on the fly
00:51 karolherbst: but yeah
00:51 karolherbst: that would be one benefit
00:51 jenatali: On Windows, we chose to provide callbacks from the runtime to cache variants
00:52 karolherbst: so we could add a new api: get_compiled_shader_state returning a struct with the binary + size + some stats we need for clover
00:53 karolherbst: and we could call it on programm states
00:53 karolherbst: ehh.. pipe_shader_state
00:53 karolherbst: or pipe_compute_state
00:53 karolherbst: simply annoying we have two structs here :D
00:53 karolherbst: s/simply/mildly/
00:54 karolherbst: mhhh...
00:58 curro_: yeah, or you could add a param query hook you pass a compute state to and some enum specifying whether you want the binary back or whatever other metadata it needs from the driver
00:59 curro_: kind of like get_param and friends but per-context
00:59 karolherbst: curro_, airlied: https://gist.github.com/karolherbst/9cd9142b7074b82d93b70da16142bf45
00:59 karolherbst: ehhh
01:00 karolherbst: now it is correct
01:00 karolherbst: and then we have the same interface for all *_state_create functions
01:01 karolherbst: and pipe_binary_program_header is this "blob + size" struct
01:01 karolherbst: being serialized nir or whatever
01:01 karolherbst: or LLVM or final binary
01:01 karolherbst: mhhh
01:02 karolherbst: maybe we should keep it splitted up :D
01:02 karolherbst: but we could share the union
01:02 karolherbst: pipe_compute_state.prog is quite annoying to deal with
01:03 karolherbst: being void* and such
01:03 karolherbst: oh well...
01:04 karolherbst: maybe I play around with that a little
01:08 airlied: it's a bit tricky I think in placesw where drivers use nir->Tgsi etc
01:08 airlied: they may store the tokens in the templ
01:08 airlied: definitely needs checking
01:08 airlied: otherwise yes merging those things would be nice
01:10 karolherbst: yeah.. maybe
01:11 karolherbst: airlied: I think I would keep it separate but having a new struct containing the ir type and the union for all the shader types
01:11 karolherbst: or so?
01:11 karolherbst: mhhh
01:11 karolherbst: oh well
01:11 karolherbst: but the bigger issue is, we don't really have this API to ensure the drivers compile already
01:11 karolherbst: and that we could retrieve anything
01:13 airlied: karolherbst: the retrieve api could trigger a compile
01:14 karolherbst: right.. but what do you pass in as the argument? the void* pointer from state_create?
01:14 airlied: yes
01:15 karolherbst: jenatali: anyway, I think I will look into the offset stuff next.. that's kind of all the remaining stuff now :)
01:15 jenatali: karolherbst: Cool, sounds good
01:15 karolherbst: there is still some constant fallout, but uff.. I will figure this out eventually :p
01:16 karolherbst: ahh and async as well...
01:16 karolherbst: but why does the async stuff needs libclc?
01:16 jenatali: It's implemented in libclc for free
01:16 karolherbst: mhhhh
01:16 jenatali: Unless you want to try to actually make it async
01:17 karolherbst: well....
01:17 karolherbst: ampere can do it async.. part of it :D
01:17 karolherbst: supports async global to local natively
01:17 jenatali: The libclc patches provide a pretty clean way to override using the libclc implementation from vtn
01:17 karolherbst: yeah
01:18 jenatali: So, the libclc can be a fallback for when there's not a nir instruction, and then we can add more on top
01:18 karolherbst: I mean, I am fine to use whatever libclc provides.. but is there actually anything really complicated there?
01:18 karolherbst: it's just a copy inside the shader + a wait on it, no?
01:18 jenatali: Nah, not really, to be honest
01:19 jenatali: Just easier to take an existing implementation than have to write a new one
01:20 karolherbst: wondering what libclc ends up doing and how terrible the result is
01:20 karolherbst: heh.. let's see what nvidia does :D
01:21 jenatali: https://github.com/llvm/llvm-project/blob/master/libclc/generic/lib/async/async_work_group_strided_copy.inc
01:22 karolherbst: ehhh
01:22 karolherbst: makes sense I guess
01:23 karolherbst: and the wait is a barrier mhhh
01:23 karolherbst: this can be better :p
01:23 airlied: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/opencl/src/async/awgcpy.cl
01:23 airlied: is amds
01:24 karolherbst: ehh
01:24 karolherbst: I see more potential here :p
01:24 airlied: it's likely nobody ever uses it if nobody optimises it
01:24 karolherbst: as they insert a hard barrier
01:24 karolherbst: yeah....
01:24 karolherbst: well
01:24 karolherbst: airlied: I doubt nvidia would have added a hw instruction if nobody uses it
01:24 karolherbst: :D
01:25 airlied: karolherbst: you don't know hw engineers then :-P
01:25 karolherbst: :D
01:25 karolherbst: how do I even create the event thingy?
01:26 jenatali: karolherbst: Good question
01:26 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035/diffs?commit_id=b7b00df4bcce9e1f9dee21b03875d9a02a97b6e7
01:26 karolherbst: jenatali: I meant in clc :p
01:26 jenatali: Just declare it
01:26 karolherbst: ahh
01:31 karolherbst: oh well
01:34 curro_: heh, seems reasonable that people aren't doing anything particularly complicated for those intrinsics, it's the kind of optimization with diminishing returns the larger your work-group parallelism is...
01:34 jenatali: Personal opinion: Make it work first, then make it fast
01:35 curro_: but only make it fast if you can find a real-world scenario where it's actually helpful? ;)
01:35 jenatali: Ehhh then you get a chicken and egg problem
01:36 jenatali: But maybe with CL that's actually fine
01:38 curro_: maybe, but maybe it's fine for nobody to use it if the potential performance improvement from that feature isn't sufficient to justify the effort of doing something truly asynchronous... :P
01:38 jenatali: Yep, exactly
01:41 curro_: like, it seems like an optimization that can only possibly reduce the latency of a kernel invocation under certain scenarios, but it won't change the maximum throughput that it can achieve if there are enough workgroups queued up and the GPU is doing smart enough thread scheduling to keep the ALUs reasonably utilized
08:33 mripard: sravn: -rc1 isn't in drm-tip yet and dim apparently checks for that
08:33 mripard: (and I'm not really sure how to get it there)
08:39 mripard: mlankhorst: do you know?
09:36 Vanfanel: pq: just before destroying the GBM surface to recreate it with new dimensions (when I do a resolution change), I have to point the PRIMARY plane to a buffer that is not one of the GBM surface buffers (because I am going to destroy them along with the GBM surface). What I do is point the PRIMARY plane to the original buffer, where the Linux TTY is, but that causes the TTY to be visible for an instant.
09:36 Vanfanel: Is that the right solution? Or is there another solution so I don't get that ugly effect?
09:38 emersion: i think it's safe to hand the buffer to KMS and then destroy it?
09:39 emersion: because it's ref'counted?
09:40 Vanfanel: emersion: what do you mean when you say "hand the buffer to KMS"?
09:40 emersion: use the FB in an atomic commit
09:40 Vanfanel: sorry, english is not my primary language
09:41 emersion: hand over*
09:41 emersion: me neither ;)
09:41 pq: Vanfanel, you can do whatever you want as long as you do not call drmModeRmFB() as long as the old FB is up in KMS.
09:42 pq: if you RmFB an FB on the primary plane, it will turn off the CRTC, IIRC
09:42 emersion: ah
09:42 pq: otherwise, yeah, the kernel ref-counts everything, so you can destroy things
09:43 pq: but it might be awkward to arrange your code to be able to RmFB after the buffer has been "destroyed"
09:44 pq: hence, do not destroy the old GBM surface until you have a new buffer from the new GBM surface up in KMS
09:44 pq: or, keep track of FB separate from the buffer
09:47 pq: I can't say what Weston does in this case, because it does not do on-the-fly video mode changes.
09:47 Vanfanel: pw: that is the problem, I was getting the CRTC disconnected... so thst's why I started pointing the PRIMARY plane to the original FB before destroying the GBM surface...
09:47 Vanfanel: But I am not calling drmModeRmFB() anywhere: this is how I destroy the GBM surface: https://github.com/vanfanel/SDL/blob/ffd4fe8070597f4bb2e757bd4008841b69ebf568/src/video/kmsdrm/SDL_kmsdrmvideo.c#L779
09:48 pq: Vanfanel, you can't really try to "out-run" the kernel like that ;-)
09:48 pq: well, you can't show a GBM buffer on KMS without creating an FB for it, so something must call RmFB as well, or you're leaking them.
09:49 pq: Vanfanel, who does that AddFB call?
09:49 pq: *the
09:50 MrCooper: pq: technically it turns off the plane, not necessarily the CRTC (unless the driver doesn't support enabled CRTC with disabled primary plane)
09:50 pq: okay, makes the screen go blank or black, sans any other planes that might remain
09:51 MrCooper: right
09:51 pq: Vanfanel, what's this: https://github.com/vanfanel/SDL/blob/ffd4fe8070597f4bb2e757bd4008841b69ebf568/src/video/kmsdrm/SDL_kmsdrmvideo.c#L697
09:52 Vanfanel: pq: I am calling it indirectly from https://github.com/vanfanel/SDL/blob/ffd4fe8070597f4bb2e757bd4008841b69ebf568/src/video/kmsdrm/SDL_kmsdrmopengles.c#L116
09:52 Vanfanel: pq: oh, sorry, I had forgotten...
09:53 pq: Vanfanel, make sure you don't RmFB an FB that is active on a KMS plane, and you should be fine. :-)
09:54 pq: you can RmFB is once the next KMS atomic commit that replaces the FB on that plane has completed, so you may need to wait for the flip event too.
09:54 pq: *can RmFB it
09:55 Vanfanel: pq: Oh! So I shouldn't delete the GBM surface until the next one is ready, right? So I can point the PRIMARY plane to a buffer on the new GBM surface before destroying the old one, right? :)
09:56 pq: but all that is really just the exact same "is it released yet?" handing you need to do when re-using a buffer anyway, e.g. before you know you can release it back to a GBM surface.
09:56 emersion: reminder that submitting a FB is a 2-stage process: an atomic commit queues the FB for the next flip, so you have to wait for 2 flip events for the FB to stop being used
09:56 emersion: for the previous FB to stop being used*
09:58 pq: Vanfanel, yes, you can do that. But it is a bit of a temporary memory hog. If you allow an FB to remain after GBM bo is destroyed and ensure the clean up the FB when it's safe, you might not peak that high in memory consumption.
09:59 emersion: pq: hmm, how's that? at some point you'll have both the previous buffer and the new buffer allocated?
09:59 MrCooper: FWIW that's basically how xf86-video-amdgpu/ati handle this, via reference counting on KMS FBs
09:59 pq: yes, not just *a* buffer, but the GBM surfaces with two buffers per surface
10:00 emersion: yeah, wlroots refcounts buffers too
10:00 pq: FBs, not buffers
10:00 emersion: ah, hm
10:00 pq: when you create an FB, it takes a ref on the underlying buffer in kernel
10:01 pq: so destroying the GBM bo and GBM surface should be fine even when the FB remains
10:01 emersion: yes
10:01 emersion: i see
10:01 pq: the important thing is when you call RmFB
10:02 emersion: well, i wouldn't care too much about optimizing for memory consumption for a first version
10:02 emersion: already complicated enough as-is :P
10:02 pq: heh, looking at weston, struct drm_fb indeed has a refcount
10:03 pq: indeed, but then you need to keep two GBM surfaces around for a moment at the same time :-)
10:03 emersion: hm
10:03 emersion: indeed, it's not necessarily simpler
10:03 pq: so you need a delayed destruction for something anyway
10:05 Vanfanel: I am starting to think my temporary workaround of pointing the PRMARY plane to the original FB where the TTY is, is not such a bad idea until I can do it better...
10:06 pq: I'm not sure you can count on that to keep on existing...
10:06 MrCooper: also not sure that's really better than just letting the plane be off temporarily
10:07 emersion: yeah, black screen better than FB-that-might-not-exist-anymore IMHO
10:07 emersion: (might not exist if the previous DRM client wasn't kmscon for instance)
10:08 MrCooper: fbcon, kmscon hasn't replaced that yet in this universe ;)
10:08 emersion: err, right
10:08 Vanfanel: ok, so I can't count on the TTY FB and it's an ugly effect anyway... what should I point the PRIMARY plane to, then? I mean, what should I set the props (and what props) so I get a black screen and no CRTC disconnection happens?
10:08 emersion: zero
10:09 Vanfanel: emersion: FB_ID and CRTC_ID to zero? that's all?
10:09 pq: why not just keep your current code then? the RmFB does the "reset".
10:10 emersion: hm, i think i had some EINVAL from some drivers when disabling the primary plane
10:10 emersion: without disabling the whole CRTC
10:10 pq: I mean, code where you simply let RmFB happen and don't fiddle with any temporary FB commits
10:11 MrCooper: emersion: KMS helper code handles that and disables the CRTC as well in that case
10:11 emersion: some drivers maybe aren't using the helpers?
10:12 MrCooper: amdgpu DC is a bit buggy WRT that, working on fixing it (slowly, on the backburner)
10:12 Vanfanel: I tried setting the FB_ID and CRTC_ID of the PRIMARY plane to zero, and indeed screen gets off and doesn't come up again without going to a different TTY. I am on AMDGPU.
10:16 pq: Vanfanel, when you set the new mode, maybe you're missing some to set some properties to turn it back on?
10:16 pq: *some properties
10:17 pq: agh, I can't make sentences
10:17 pq: Vanfanel, when you set the new mode, maybe you're missing some properties to turn it back on?
10:17 pq: I mean, for now, the CRTC has always been on when you started, so maybe you missed something to begin with.
10:18 pq: like ACTIVE and MODE_ID on the CRTC
10:19 pq: (which you should also set, when you turn things off, so that the CRTC actually turns off for sure)
10:23 Vanfanel: pq: yes. Now I don't get a black screen anymore, but instead I get EINVAL (-22) when trying to set a cursor...
10:23 pq: Vanfanel, is the CRTC on and primary plane active when you try to set a cursor?
10:27 Vanfanel: pq: I think the setcursor() call arrives at the time whiel it's not, sometimes at least, and that's why it EINVALs. (I use a small SDL2 test program where I change screen resolution and SDL2 automatically sets a cursor just after that).
10:27 Vanfanel: Some other times, the commit with the setcursor prop changes DOES succeed.
10:28 Vanfanel: This is all happening if I don't point the PRIMARY plane to the original TTY FB
10:29 Vanfanel: Also, in some SDL2 programs. if I don't point the PRIMARY to the original TTY FB before destroying the surface, what happens is that setcursor() succeeds, but I get a forever-black screen after the program exits, way later
10:32 Vanfanel: pq: so, in sum. not pointing to a "safe" FB before destroying the GBM BO just opens a can or worms here :(
10:34 pq: your analysis is correct, but really fbcon should restore the display after your program exit
10:34 pq: no matter what state you left KMS in
10:35 pq: could it be that your program exit does not restore the VT to KD_TEXT?
10:35 Vanfanel: pq: that happens on the Raspbery Pi (VC4) for example, but not in this computer with AMDGPU
10:35 pq: ahha, sounds like a driver bug then
10:36 Vanfanel: VC4 doesn't show these strange behaviours... but I need SDL2 to work on a wide array of hardware, not only VC4 :D
10:36 pq: "that happens" is the black screen or the fbcon restore?
10:37 Vanfanel: that happens on AMDGPU. VC4 is fine, things go back to normal after the SDL2 tests exit.
10:38 pq: "that" is black screen?
10:38 Vanfanel: pq: yes
10:38 pq: ok, I guess "that" changed meaning there
10:39 Vanfanel: Ah, sorry!
10:39 pq: anyway, still likely a driver bug, IMO - or your program forgets to clean up the VT on exit
10:41 Vanfanel: pq: if it was my KMSDRM backend, I guess it would happen when I do the "safely point primary plane to original FB" thing too. But it only happens when I don't.
10:41 pq: I think there was a pile of heuristics on when fbcon will take over, and I can't remember them or really know then either.
10:41 pq: no, not KMS state. I mean VT state.
10:42 pq: and yes, if you point the primary plane back to the original FB from fbcon, then you are doing what fbcon should be doing anyway
10:44 pq: KMS programs should not restore KMS state on their exit, unless it's for smooth hand-off purposes, and in the smooth hand-off case is not a restore but a simple custom configuration.
10:44 pq: but VT is a different beast
10:45 pq: Vanfanel, you do handle VT, right?
10:45 Vanfanel: pq: I didn't wriote that part of the backend.. let me see
10:45 Vanfanel: *write
10:46 pq: KD_GRAPHICS, KDSKBMUTE, etc.
10:48 pq: or do you depend on logind for all that?
10:49 Vanfanel: pq: I don't see a single mention to those in the whole SDL2 codebase...
10:49 Vanfanel: and I don't even know what logind is, but I believe it's related to X server...?
10:49 pq: Weston has three ways to deal with the VT: let logind handle it, do it in-process, or do it in a setuid-root launcher process
10:50 pq: no
10:50 pq: https://www.freedesktop.org/wiki/Software/systemd/logind/
10:50 pq: logind allows your KMS program to run as a normal user
10:50 pq: there is also elogind for those who don't use systemd
10:51 emersion: note: all of that is Linux-specific
10:51 emersion: there is also https://git.sr.ht/~kennylevinsen/seatd
10:52 pq: otherwise, you need *something* running as root, so that you can open input devices and gain DRM master in all cases. Gaining DRM master is also possible if you're the first to open the device.
10:53 pq: Vanfanel, see in https://gitlab.freedesktop.org/wayland/weston/-/tree/master/libweston : launcher-direct.c is the in-process case, launcher-logind.c is logind, and launcher-weston-launch.c talks to a custom setuid-root program.
10:54 xantoz: I saw that recently it's finally possible to drop DRM master, originally gained by being the first to open the device, as non-root
10:54 pq: launcher-direct is nasty, because it requires you to run the program as root, but it has the low-level code of what needs to happen for proper VT handling
10:55 xantoz: hmm, seatd looks interesting
10:55 emersion: … libseat would do everything for you, for free
10:56 pq: Vanfanel, anyway, this is kind of a side-track. My guess was that some missing setup on the VT might stop fbcon from taking over.
10:56 pq: Vanfanel, or it is simply a DRM driver bug or fbcon bug that it doesn't restore correctly.
10:57 xantoz: fwiw, in mpv we also just restore the FB that we got from the fbcon. I think we still don't actually switch to KD_GRAPHICS, which maybe we should... (but not doing so turned out to be useful on at least my multi-head setups, since I could see terminal output on the other screen...)
10:57 xantoz: but seeing the discussion above, if we switch to KD_GRAPHICS and then back to KD_TEXT, we don't actually need to manually restore the FB?
10:58 pq: xantoz, AFAIU, you should not need to manually restore the original FB.
10:59 pq: if you don't switch to KD_GRAPHICS, then fbcon might do something and walk over you, perhaps
10:59 pq: and KDSKB stuff is necessary if you open input devices directly, so that the tty stop receiving copy of the same input
11:00 xantoz: hasn't been the case on intel, thus far at least. but I don't really know about other drivers of course
11:00 xantoz: pq: currently we don't implement any input handling in the DRM backend itself, so it relies on the tty receiving input
11:00 pq: xantoz, right, that's a choice you can make.
11:01 pq: what hasn't been a case?
11:01 xantoz: the fbcon stepping over us
11:02 xantoz: could even do funky things like show the fbcon over the video if I associate it's fb with an overlay plane (happened by accident once during development)
11:02 pq: well, I guess usually fbcon does not need to do anything but write into its FB, so if you switched the FB away, it doesn't matter
11:02 pq: but try something like monitor hotplug
11:02 xantoz: monitor hotplug might be kind of out of scope for that DRM driver
11:03 pq: it may also be that a DRM master is enough to stop fbcon
11:04 xantoz: s/driver/backend/
11:04 xantoz: it's deliberately kept quite simple/stupid
11:04 Vanfanel: pq: since it's only happening on AMDGPU, I guess it's a driver bug. Anyway, pointing to the original TTY FB before destroying the GBM BO on resolution change does the job, but if I can't rely on TTY FB, that is a potential problem...
11:05 pq: so I'm not quite sure why KD_GRAPHICS is necessary, but just like everything else about the VT, do not try your luck in making the ancient ones angry ;-)
11:05 xantoz: it does need to become somewhat better at figuring out which CRTC to choose though. I remember having problems on panfrost when attempting to do hwdec (which requires an overlay plane, and it didn't always pick the CRTC that actually has an overlay plane)
11:06 pq: Vanfanel, I think you should file a driver bug report to find out what exactly is going on.
11:07 xantoz: last time I attempted to fix that I had problems changing what CRTC was active for the display (eDP internal LCD panel on pinebook pro) using atomic though
11:07 pq: xantoz, what about picking a CRTC based on which connector is connected so you can actually drive that connector?
11:07 xantoz: maybe I'll actually try with KD_GRAPHICS next time I get around to it (need to fix my linux install on the pinebook pro, first...)
11:07 xantoz: pq: that's the current logic
11:07 xantoz: the problem is that by default the CRTC associate with the internal display is the one that only has a primary plane
11:08 Vanfanel: pq: the problem is that I can only work on this in bursts of time every 6-7 months, so I may not be able to keep track of the bug, etc
11:08 xantoz: if I plug in an external display, that usually gets the CRTC with two planes and the hwdec works
11:08 Vanfanel: pq: in fact, I am running out of time and I don't be able to look at it again this year
11:09 pq: xantoz, and both CRTCs could driver either connector, no restrictions in that?
11:09 xantoz: iirc, yes
11:09 xantoz: I think I was able to get it to work with legacy drm
11:09 xantoz: I might've just been doing something wrong
11:09 Vanfanel: pq: in what situation would be relying on original TTY FB be a bad idea?
11:09 xantoz: err, work as in change what CRTC it used. it being legacy, it couldn't use the overlay plane of course
11:10 pq: Vanfanel, when the old FB is not in fact from fbcon, I suppose.
11:11 pq: xantoz, did you change everything in a single atomic commit, or were you trying to do two atomic commits similar to how legacy KMS works?
11:11 xantoz: I think I was trying to do it with two
11:11 pq: Vanfanel, fbcon might not even exist.
11:12 Vanfanel: pq: and what about creating my own GBM BO so I have a secure FB I can count on when I do the GBM surface destruction?
11:12 pq: xantoz, try with one. The case with two very likely fails as the intermediate configuration is illegal.
11:12 xantoz: might try to do it with one next time, will just take some restructuring of the code (if I could just drop legacy DRM support...)
11:13 pq: xantoz, the reason it works with legacy KMS is that legacy KMS is allowed to touch more that it's told to, so it can steal CRTCs from active outputs etc.
11:13 xantoz: yeah
11:13 xantoz: indeed. I remember having to fix the logic of what CRTC is picked to make the atomic mode work better (which is how it ended up at the current logic of picking the currently active CRTC on the selected display, if there is one)
11:13 pq: or do whatever, I don't know what exactly legacy SetCrtc can do that atomic can't
11:14 xantoz: because the old code was picking the first applicable CRTC, IIRC
11:14 xantoz: picking the currently active CRTC also had the aforementioned bonus that it would usually allow me to see the terminal on another monitor :p (with the old behavior it would often steal that monitors CRTC)
11:15 Vanfanel: pq: or, what the hell, why not just create the new GBM, point PRIMARY plane to one of it's buffers, and ONLY THEN destroy the old GBM surface... I dould only get a momentary memory hog, right? no extra CPU usage or anything like that
11:15 pq: right, I think going with a single atomic commit to configure everything should work even when you need to swap CRTCs between connectors.
11:15 xantoz: right. will give that a try
11:15 xantoz: it would be nice to restructure the code towards that in general, really
11:16 pq: Vanfanel, no, that would be totally fine. :-)
11:16 pq: Vanfanel, I thought you thought it would be difficult to write into code. :-D
11:16 xantoz: problem is how to shoehorn in legacy support when you structure the code around the single atomic commit...
11:16 xantoz: maybe using some sort of helper that is drmModeSetCrtc the first time round, and then is drmModePageFlip?
11:17 pq: xantoz, Weston did such refactoring, fwiw
11:17 pq: xantoz, basically Weston collects state internally, and then forks into legacy or atomic commit paths
11:17 pq: when it's time to apply it
11:18 pq: and that collection is over multiple outputs
11:18 xantoz: hmm. yeah I guess that's what you end up having to do...
11:18 xantoz: well, in mpv's DRM backend we only really deal with one output
11:18 Vanfanel: pq: Hmm... no, I just have to { create new GBM surface; get one buffer out of it; point primay plane to that buffer; destroy old GBM surface; } No extra difficulties I can see before trying.. :D
11:18 pq: if you end up with multiple output's update in a single set and you use atomic, then just call the legacy KMS for each output
11:18 pq: but on atomic path you can craft a single commit
11:18 xantoz: might take a look at what weston does for reference then
11:19 pq: xantoz, ask me for pointers when you do, it may be complex to dive in
11:19 xantoz: although might be a bit simpler, since the scope of the backend is single-output only (Basically, if you need multiple output support, you're probably better served by x11 or wayland backends, anyway)
11:19 xantoz: pq: sure
11:21 pq: xantoz, oh, that's *much* simpler. You'll just have to explicitly re-route all connectors and turn on/off all CRTCs in one commit.
11:21 xantoz: yeah
11:21 pq: also connectors and CRTCs you *don't* use, that's the key
11:21 xantoz: you have to poke all of them, even ones that aren't currently in use or are planned to become in use?
11:22 pq: with atomic it's really important to explicitly turn off everything you don't use
11:22 xantoz: because normally there'll be one or two CRTCs. The one that is currently on the selected output, and the one we want to have on that output
11:22 xantoz: the latter might or might not be in use already
11:23 pq: yeah, poke them all into exactly how you want them, leave nothing as-is
11:25 xantoz: right. touch every single CRTC
11:25 pq: if you ignore some CRTC or connector, it's possible it leaves you with an illegal configuration: two CRTCs to one connector or such, and that fails rather than kernel doing mumblemumble to fix it up for you
11:25 xantoz: "touch every single CRTC *and* connector", even?
11:26 pq: the CRTC-connector routings are the important bit, to tear down those you don't want
11:26 xantoz: yeah
11:26 xantoz: guess I'll have to plan in resuscitating the pinebook pro sometime soon
11:27 pq: since connector has CRTC_ID property, I think you should indeed touch that on all connectors
11:27 xantoz: right
11:54 bbrezillon: pendingchaos, jekstrand: anything outstanding in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4917 ?
11:55 bbrezillon: I dropped the atomic load/store patches, and the other patches didn't seem to be controversial
12:05 karolherbst: bbrezillon: what about the virgl fail?
12:05 karolherbst: would be odd if it's caused by your changes, but still, it's a CI fail
12:07 bbrezillon: karolherbst: not my fault :P
12:07 bbrezillon: those fails from time to time
12:27 karolherbst: ahh
12:33 daniels: bbrezillon: is it reported somewhere?
12:33 daniels: I already blacklisted one VirGL test that sometimes crashes this morning, but https://gitlab.freedesktop.org/bbrezillon/mesa/-/jobs/4135864 seems new (cc tomeu gerddie)
12:40 mripard: sravn: I don't really know what changed, but it worked this time
12:46 bbrezillon: daniels: might not be the exact same issue then
12:46 daniels: bbrezillon: yeah, there is a now-skipped dEQP test (CS I think?) which crashed, but segfault running traces is new
12:47 bbrezillon: looks like the jellyfish.rdc file is missing
12:47 bbrezillon: how can you tell it's a segfault BTW? couldn't find a log file containing enough info to figure that out
12:48 daniels: [dump_trace_images] Info: Dumping trace traces-db/glmark2/jellyfish.rdc... ERROR
12:48 daniels: [dump_trace_images] Debug: === Failure log start ===
12:48 daniels: [dump_trace_images] Running: /builds/bbrezillon/mesa/install/tracie/renderdoc_dump_images.py traces-db/glmark2/jellyfish.rdc traces-db/glmark2/test/gl-virgl
12:48 daniels: [dump_traces_images] Process failed with error code: -11
12:48 daniels: [dump_trace_images] Debug: === Failure log end ===
12:49 bbrezillon: oh, so the error code is the signal
12:49 daniels: it's the return code from a process invocation
12:50 daniels: rather than necessarily an actual errno
13:08 MrCooper: emersion: FWIW I was specifically talking about the "RmFB of FB assigned to primary plane" case; atomic userspace has to handle this case itself for atomic commits
14:21 jekstrand: bbrezillon: Need to read it again
14:21 jekstrand: It's been a while
14:25 jekstrand: karolherbst, jenatali: Does OpenCL allow indirect access of vector components? i.e. `vec4 v = foo(); bar(v[i]);`
14:25 jenatali: jekstrand: Don't think so, let me double-check
14:26 jenatali: No: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#vector-components
14:27 jenatali: HLSL-style swizzling, but not array indexing
14:27 jekstrand: jenatali: If not, then I think the solution to array-deref-of-vec and generic pointers is to just run lower_array_deref_of_vec
14:27 jekstrand: It won't emit any if-ladders if the indices are always constant
14:27 jekstrand: It'll just emit swizzles and write-masks so the worst that can happen is you end up doing a wide load when you don't need to.
14:28 jenatali: Sounds reasonable I think
14:29 jekstrand: It's semi-important for GL/Vulkan when you can do indirects because if-ladders are nasty
14:41 karolherbst: jekstrand: so, I guess I should drop the RMW patch for now?
14:45 jekstrand: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6372
14:45 jekstrand: karolherbst: That's I think the best back-portable solution
14:45 jekstrand: And it really does need a backport
14:46 karolherbst: ahh
14:46 karolherbst: right.. that's the same patch, no?
14:46 karolherbst: or did I mess up locally?
14:47 jekstrand: karolherbst: Maybe?
14:47 jekstrand: karolherbst: I pushed it off to Jenkins and then played hookie for the rest of the day.
14:47 karolherbst: anyway.. don't mind it being in its own MR I just picked one patch up for !6367
14:47 karolherbst: okay
14:47 karolherbst: bbrezillon: I think we can merge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6256 now :) unless jekstrand has any concerns?
14:48 jekstrand: karolherbst: ACK
14:50 bbrezillon: karolherbst: queued
14:52 sravn: mripard: Thanks! I am sure it worked because you have magic fingers (or something like that)
14:55 Vanfanel: pq: I am trying but.. is it possible at all to create the new GBM SF and have access to GBM BOs belonging to that GBM SF before calling gbm_surface_lock_front_buffer()?
14:55 Vanfanel: I mean... is there a way to have access to a GBM SF BOs without calling gbm_surface_lock_front_buffer()?
15:10 karolherbst: bbrezillon: why unqueueing?
15:10 karolherbst: ohh
15:10 karolherbst: tags
15:11 bbrezillon: karolherbst: yeah, I wish they were added automatically :-/
15:11 jekstrand:might have a "real" generic pointers patch series soon
15:12 jenatali: jekstrand: \o/
15:12 jekstrand: jenatali, karolherbst: Where do I find the OpenCL CTS?
15:12 karolherbst: yay
15:13 jekstrand: I'm considering giving it a go on iris and watching it burn.
15:13 jenatali: https://github.com/KhronosGroup/OpenCL-CTS
15:13 karolherbst: it is annoying to build though
15:13 karolherbst: always depends on bleeding edge opencl-headers
15:13 jenatali: Yeah, a bit
15:14 imirkin: clover expects somewhat different pipe driver callbacks
15:14 imirkin: unless that's been changed?
15:14 jekstrand: imirkin: I've got a patch for that. :)
15:14 imirkin: ah ok :)
15:14 jekstrand: Yeah, it requires pipe-loader support
15:14 jekstrand: Unless you link all the drivers into clover
15:14 imirkin: no
15:14 imirkin: i mean like
15:15 imirkin: context->set_global_bindings
15:15 imirkin: and whatnot
15:15 imirkin: (or screen->foo)
15:15 jekstrand: Yeah, I've got those too
15:15 jekstrand: Landed them upstream, even
15:15 imirkin: nice
15:15 jenatali: And caught Phoronix's eye with it :P
15:15 jekstrand: I explicitly haven't landed the pipe-loader or CAP enable patches.
15:17 jekstrand: I think if you build with all the gallium drivers statically linked into clover, it will try to advertise OpenCL on iris.
15:18 jekstrand: I've not gone that far out of my way to prevent it
15:18 daniels: jenatali: 'No, they are not porting the Mesa drivers for usage on Windows'
15:18 daniels: jenatali: all this time I wasted learning PowerShell and swearing at msbuild!!
15:18 jenatali: daniels: Where did that come from? :)
15:18 jekstrand: well.... "Porting" has many different meanings. :)
15:18 daniels: jenatali: second para of the Phoronix link
15:19 daniels: *article
15:19 jenatali: Which one? I must've missed this one
15:19 daniels: https://www.phoronix.com/scan.php?page=news_item&px=Microsoft-Doubles-Mesa-Commits
15:19 jenatali: Oh, look, there it is :P
15:20 jekstrand: Looks like someone gave jenatali developer access too
15:20 jenatali: jekstrand: Yeah, airlied did yesterday
15:20 jekstrand:was going to just now
15:20 jenatali: Thanks :)
15:21 jenatali: daniels, airlied: For libclc, after rewriting history a bit, we have a series that we could partially land given what's upstream in LLVM already, but fmod/ldexp/fma callouts can't be done until the last 2 LLVM patches are merged. I'm curious if you think we should land anything now, or wait?
15:23 jenatali: Oh hey, the libclc patches *also* got a Phoronix article: https://www.phoronix.com/scan.php?page=news_item&px=LLVM-libclc-Mesa-SPIR-V
15:24 jekstrand: jenatali: Congratulations! You're well on your way to being a Phoronix super-star. :-P
15:35 jekstrand: jenatali, karolherbst: Are atomics on generic pointers a thing?
15:36 karolherbst: jekstrand: yes
15:36 karolherbst: at least the interface isn't restricted to mem types
15:36 karolherbst: and you can do atomics on all mem types anyway
15:37 jekstrand: karolherbst: Ok. I'll add an if-ladder to lower_io_atomic
15:39 jekstrand: karolherbst, jenatali: I just force-pushed my generic pointers MR. I think it's a lot more complete now.
15:40 jenatali: Cool :)
15:40 jenatali: I'll take another look
15:40 karolherbst: wondering if I should five it a go on the complete CTS :D
15:40 jekstrand: karolherbst: Why do you think I was asking about the OpenCL CTS? :)
15:41 karolherbst: LD
15:41 karolherbst: :D
15:41 karolherbst: I hope I don't break my machine this time
15:45 karolherbst: it's insane on how much we pass :O
15:45 jenatali: :)
15:45 karolherbst: but maybe the CTS just returns 0 :D
15:45 karolherbst: but I think I ixed that
15:45 karolherbst: *fixed
15:46 karolherbst: and I don't even have libclc
15:46 jenatali: Libclc only really matters when you get into the math bruteforce test
15:46 daniels: btw I filed https://gitlab.freedesktop.org/mesa/mesa/-/issues/3437 as a kind of master tracker for various CI issues so we can try to keep on things
15:46 karolherbst: I run 1k tests now :p
15:46 karolherbst: "Pass 408 Fails 46 Crashes 14 Timeouts 5: 44%" right now
15:47 karolherbst: I like my cts runner :D
15:47 jenatali: karolherbst: FYI for how badly bruteforce will fall over, given how many opcodes were just flat out unimplemented: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035/diffs?commit_id=48fb82fe35ccf1b4928148fb798eed801ef1124d
15:48 karolherbst: yeah.. that's fine though
15:48 karolherbst: but a lot of those CTS tests also bail because I only advertise 1.1... mhhh
15:48 karolherbst: let me put this to 2.2 :D
15:48 jenatali: Bahaha good look
15:48 jenatali: luck*
15:49 karolherbst: ahh I should add support for printf as well
15:49 daniels: jenatali: I need to step out now, but I'll get to looking at the new history first thing
15:49 jenatali: daniels: No worries, thanks :)
15:49 daniels: thanks for doing that!
15:49 daniels: like I said, the end result looked totally good to me, it's just that the working was a bit of snakes & ladders :)
15:50 jenatali: Yeah, reworking it wasn't too bad, I even managed to keep original authorship for most of the patches
15:50 jekstrand:rebuilds, this time against LLVM 10
15:51 karolherbst: ehh
15:51 karolherbst: printf has it's own format of dumping subtests...
15:52 jekstrand: Ugh... Now I can't link!
15:52 karolherbst: :)
15:52 jenatali: The joys of LLVM
15:54 jekstrand: What? Now it decided that it's lib suffix is lib64
15:54 jekstrand: *sigh*
15:54 jekstrand: Got it! I think.....
15:55 karolherbst: famous last words
15:55 jekstrand: hehe
15:55 karolherbst: jenatali: soo.. 1136 subtests :) (excluding conversions)
15:55 jenatali: Including bruteforce?
15:56 karolherbst: jenatali: this is what my runner generates: https://gist.githubusercontent.com/karolherbst/dd3ee49c5e987f8365ffd925552df1f4/raw/8d88270c51a2da8f797ea4d6534b0640cb6d8238/gistfile1.txt
15:56 jenatali: Cool
15:56 karolherbst: and it has this nice progress bar thing :)
15:57 karolherbst: "Pass 158 Fails 7 Crashes 15 Timeouts 1: 16%|█████████████████▎ | 181/1136 [02:10<24:33, 1.54s/it]"
15:57 jekstrand: Does the OpenCL CTS multi-thread?
15:57 karolherbst: by default yes
15:57 jekstrand: Oh, cool.
15:57 karolherbst: but with my runner I force single threading
15:57 jekstrand: Does it gracefully handle test crashes?
15:57 karolherbst: as I run everything parallel anyway
15:57 karolherbst: yeah.. the process just crashes :p
15:58 jekstrand: I'm tempted to use Piglit to run it
15:58 karolherbst: you'd need to parse subtests
15:58 karolherbst: and .. return code is annoying
15:58 jekstrand: :-(
15:58 karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/opencl_cts_runner
15:58 karolherbst: but it requires patches on the CTS
15:59 jekstrand: Oh, man, the OpenCL CTS is one of the old-school Khronos test suites
15:59 jekstrand: Lots of options everywhere
15:59 karolherbst: jekstrand: my downstream patch: https://github.com/karolherbst/OpenCL-CTS/commit/4ae263bdd6642c633cbac37fc69648eb2e86c166
15:59 karolherbst: :D
15:59 karolherbst: yeah
15:59 karolherbst: don't mind having my own runner though
15:59 karolherbst: it's not that much work
16:00 imirkin: jekstrand: but no --make-it-work option
16:00 karolherbst: I probably should just scan for executeables though..mhhh
16:00 jekstrand: They really should include the OpenCL headers in the repo.....
16:01 karolherbst: ehh.. IRC eats double spaces?
16:01 karolherbst: oh well
16:01 imirkin: i don't think so
16:02 karolherbst: imirkin: at least the dri-log doesn't have those :p
16:02 karolherbst: maybe dri-logger eats them?
16:02 imirkin: dri-log is html
16:02 karolherbst: ehhh...
16:02 karolherbst: :/
16:02 karolherbst: HTML eats them
16:02 imirkin: unless it uses white-space:pre to display stuff (which i don't think it does), spaces get normalized
16:02 karolherbst: oh well
16:03 imirkin: if you look at the source, i bet the spaces are there
16:03 karolherbst: anyway the python multiprocessing.Pool stuff is super nice to work with :) that was quite fun
16:03 karolherbst: and tqdm
16:04 jekstrand: karolherbst: Do I need special libclc stuff?
16:04 jenatali: jekstrand: Not yet, libclc patches haven't landed in Mesa yet
16:04 jenatali: !6035
16:05 karolherbst: yeah.. at some point I should have libclc locally as well :D
16:05 jenatali: karolherbst: On that libclc series, I'd appreciate if you could take a look at the Clover commits
16:07 karolherbst: those integer_ops test take sooo much time :/
16:07 karolherbst: and math bruteforce
16:07 jenatali: And conversions
16:07 jenatali: And vload/vstore_half
16:07 karolherbst: vload is fast :p
16:07 jenatali: The half versions aren't
16:07 karolherbst: and conversions isn't that bad either if you subtest it
16:07 jenatali: Conversions isn't bad if you run wimpy
16:07 karolherbst: it's just... 1500 subtests :D
16:08 karolherbst: ahh yeah..
16:08 jenatali: Same for bruteforce
16:09 jenatali: I'm trying to get a conformance log for submission, running on our WARP software driver, and on this laptop I'm using it's been running the conversions test for several days, with the CPU at 100% the whole time...
16:09 karolherbst: ehhh...
16:09 karolherbst: if you run on software that's your fault :p
16:10 jenatali: Yeah, I know - but D3D's got some conformance test gaps which means that hardware drivers aren't necessarily CL-conformant through the mapping layers
16:10 jenatali: We did a lot of bugfixes to WARP as part of bringing up CL
16:10 karolherbst: ahh
16:10 jekstrand: karolherbst: What do I put in -DCL_LIBCLCXX_DIR?
16:10 karolherbst: ehh...
16:11 karolherbst: "/usr/lib64" ?
16:11 karolherbst: dunno
16:11 karolherbst: I think it doesn't use it really
16:11 karolherbst: just at runtime or so
16:11 jekstrand: ok....
16:11 jenatali: Yeah, if you're not running the cpp tests I think it's unused
16:11 karolherbst: I won't touch OpenCL C++ any time soon :p
16:11 karolherbst: it's a 2.2 feature anyway
16:12 karolherbst: they just went nuts with 2.2
16:12 jenatali: It produces the same SPIR-V at the end of the day, so eh
16:12 karolherbst: right...
16:12 jenatali: I don't think drivers are expected to have online C++ compilers
16:12 karolherbst: but imagine those not using llvm :D
16:12 jenatali: The only thing it really adds is init/finalizers
16:12 karolherbst: right..
16:12 jekstrand: Do we not have a new enough CL header in mesa?
16:12 karolherbst: nope
16:13 karolherbst: wait..
16:13 jekstrand: We should just update it
16:13 karolherbst: jekstrand: use 30e1a427dc806c0bfa00ea57aa38438fe465a39d
16:13 karolherbst: that compiles against fedora headers
16:13 karolherbst: I try to update them often enough, but I start becoming lazy :D
16:13 karolherbst: but it's from end of july so.. meh
16:14 jekstrand: /usr/bin/ld: imageHelpers.cpp:(.text+0x9ce1): undefined reference to `clReleaseMemObject'
16:14 karolherbst: ehh
16:14 karolherbst: do you compile standalone or icd?
16:14 jekstrand: icd
16:14 karolherbst: mhhhh
16:14 karolherbst: strange
16:14 jekstrand: And I'm pointing it at the fedora installed loader, I think
16:14 karolherbst: you want to check
16:15 karolherbst: as I am sure clover doesn't exports everything yet :p
16:15 karolherbst: but yeah.. mhh
16:15 karolherbst: strange
16:16 jekstrand: objdump says /usr/lib64/libOpenCL.so exports it...
16:16 karolherbst: ninja -v and see what it links against?
16:16 karolherbst: ehh V=1 with make?
16:17 jekstrand: I'm not seeing -lOpenCL
16:17 jenatali: jekstrand: IIRC there's a CMake define that you're supposed to set to OpenCL
16:17 karolherbst: ohh right.. I should fix the compile callbacks in clover again :D
16:18 karolherbst: jekstrand: check the build_lnx.sh file
16:20 jekstrand: test_conformance/api/test_kernels.cpp: In function ‘int test_execute_kernel_local_sizes(cl_device_id, cl_context, cl_command_queue, int)’:
16:20 jekstrand: test_conformance/api/test_kernels.cpp:185:25: error: ignoring attributes on template argument ‘cl_float’ {aka ‘float’} [-Werror=ignored-attributes]
16:20 jekstrand: 185 | std::vector<cl_float> inputData(num_elements);
16:20 karolherbst: ehh...
16:21 jekstrand:drops -Werror
16:21 jenatali: Yeah...
16:22 jenatali: That code is not very good for warnings
16:25 jekstrand: test_conformance/computeinfo/main.cpp:1243:38: error: ‘union config_data’ has no member named ‘cl_name_version_array’
16:25 jekstrand: 1243 | free(info.config.cl_name_version_array);
16:26 jekstrand: | ^~~~~~~~~~~~~~~~~~~~~
16:26 jekstrand: Wow... THis thing is not very good, is it?
16:28 karolherbst: ehh...
16:28 karolherbst: jekstrand: whats your opencl-headers version?
16:29 jekstrand: whatever fedora has
16:29 karolherbst: and you switch over to 30e1a427dc806c0bfa00ea57aa38438fe465a39d for the CTS?
16:30 jekstrand: Yup
16:30 karolherbst: mhh.. let me see
16:32 karolherbst: ohh ehh... fedora didn't update on fc32.. annoying
16:34 karolherbst: I have a solution
16:35 kisak: airlied: I have to ponder how much of a frankenstein would result from bolting vallium and r600/nir together
16:37 karolherbst: jekstrand: https://copr.fedorainfracloud.org/coprs/karolherbst/OpenCL/ :p
16:41 jekstrand: I just cloned the Khronos OpenCL-Headers repo
16:42 jekstrand: Built!
16:50 jekstrand: Is there some environment variable to override the PIPE_LOADER path?
16:52 jekstrand: apparently not.....
17:01 imirkin: iirc there was a MESA_LOADER_something
17:01 imirkin: ah no, that's just DRIVER_OVERRIDE
17:02 jekstrand: imirkin: No, I had to add one. I want to use a different PIPE_SEARCH_DIR than the one compiled in
17:02 jekstrand: Because I rsync my mesa and sometimes the path is different
17:02 imirkin: right.
17:03 jekstrand: karolherbst: The CTS is trying to run image tests. Is clover advertising support when it shouldn't be?
17:03 karolherbst: I... don't think so?
17:04 MrCooper: IIRC images are part of CL core, not an extension
17:05 jekstrand: I was fairly sure they were optional
17:05 karolherbst: they are optional
17:06 jekstrand: And... linked list corruptions in iris_delete_shader_variants
17:06 karolherbst: :(
17:07 jekstrand:invokes valgrind
17:07 imirkin: they're only required in CL 1.2, afaik
17:07 imirkin: with CL 1.1 they're optional
17:07 jekstrand: ==1187370== Bad permissions for mapped region at address 0xDA0F3A4
17:07 jekstrand: That's one I've not seen before
17:07 imirkin: (in practice, 99% of programs that use CL need images, so ... effectively required)
17:09 jenatali: imirkin: Images are optional in 1.2 too I'm pretty sure
17:10 imirkin: huh, ok. maybe.
17:10 jenatali: jekstrand: There's image patches ready for review whenever you want to take a look ;)
17:10 jekstrand: jenatali: I'm well aware. :)
17:11 karolherbst: ehhh...
17:11 imirkin: jenatali: certainly the core API is required, but perhaps you can weasel out of it?
17:11 imirkin: by saying "oh yeah, sure i support images. just no supported image formats"?
17:11 karolherbst: I thought IsFinite could be implemented as "nir_ine(&b->nb, nir_fabs(&b->nb, src[0]), inf)" mhhh
17:11 karolherbst: but.. the CTS doesn't think so?
17:11 imirkin: aha - there's CL_DEVICE_IMAGE_SUPPORT
17:12 imirkin: which is allowed to be false. k
17:12 jenatali: imirkin: Yeah, exactly, there's no hard requirement for it to be true
17:12 jenatali: Unlike other stuff which has caps which have to return true unless you're embedded (like having a compiler/linker)
17:12 imirkin: right
17:13 jekstrand: Fixed that one. iris needed some longer arrays a few places.
17:13 karolherbst: what is wrong with my isfinite lowering :/
17:13 karolherbst: it fails for nan, but I don't see why
17:15 karolherbst: ehhh
17:15 karolherbst: fne..
17:15 karolherbst: but that fails as well.. mhh
17:15 karolherbst: ohh right..
17:15 karolherbst: it should be an ordered fne, but we don't have that
17:15 karolherbst: :p
17:20 karolherbst: jekstrand: soo.. unordered comparisons?
17:20 jekstrand:walks away
17:20 karolherbst: :D
17:20 ccr: "you can run, but you can't hide"
17:20 jekstrand:can only work on so many problems in one day
17:21 karolherbst: I am fine with implenenting them myself though :p
17:21 karolherbst: it's not that hard, just work
17:21 jekstrand: That's what you think......
17:23 karolherbst: what would be the issue?
17:23 karolherbst: well.. optimizations I guess.. but that still only counts as work
17:27 jekstrand: and basic/constant_source fails
17:27 jekstrand: That doesn't bode well
17:31 jekstrand: Ugh... Shared variables with initializers....
17:31 jenatali: Uh
17:31 jenatali: Is that a 2.x thing?
17:31 jekstrand: I have no idea
17:31 karolherbst: ...
17:31 jekstrand: It's a thing I'm seeing in a CTS test
17:31 karolherbst: ugh
17:32 jenatali: "Variables allocated in the __local address space inside a kernel function cannot be initialized."
17:32 anholt: jekstrand: gross.
17:32 jekstrand: jenatali: I have no idea what CL version clover is advertising
17:32 karolherbst: jenatali: so.. outside it's legal? :p
17:33 jenatali: karolherbst: You can't declare __local variables outside a kernel
17:33 jenatali: Only as kernel args or inside a kernel
17:33 jekstrand: But what about __global variables?
17:33 jenatali: jekstrand: They shouldn't be shared?
17:34 karolherbst: jenatali: ehh.. llvm does emit isifnite incorrectly?
17:34 jenatali: karolherbst: Hm?
17:35 jenatali: jekstrand: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_initialize_memory
17:35 karolherbst: lol...
17:36 jekstrand: I love how the spec says "we don't have calloc" and so what they add are "you can initialize things arbitrarily"
17:36 jekstrand: There's a very big difference between memset(&foo, 0, sizeof(foo)) and an arbitrary initializer.
17:36 jenatali: jekstrand: What test were you seeing hit that initializer?
17:37 jekstrand: test_conformance/basic/test_basic constant_source
17:37 jenatali: Uh
17:38 karolherbst: ...
17:38 jenatali: You're sure it's a shared variable?
17:38 karolherbst: I think that's constant stuff :p
17:40 karolherbst: jenatali: ehh nope, just nir_lower_bool_to_int32 failing :/
17:41 karolherbst: and for isnormal as well mhhh
17:42 jenatali: Did I do something wrong with the opcodes? Seemed to work fine for us
17:42 karolherbst: do you lower bools to int43?
17:42 karolherbst: *unt32
17:42 karolherbst: ...
17:42 karolherbst: int32
17:43 jekstrand: decl_var shared INTERP_MODE_NONE uint[16] outValues = { { 0x00000011 }, { 0x00000001 }, { 0x0000000b }, { 0x0000000c }, { 0x000007a3 }, { 0x0000000b }, { 0x00000005 }, { 0x000007c1 }, { 0x00000071 }, { 0x00000001 }, { 0x00000018 }, { 0x000007c0 }, { 0x00000007 }, { 0x00000017 }, { 0x000007bb }, { 0x00000061 } }
17:43 karolherbst: heh
17:44 jekstrand: That looks shared to me
17:44 jenatali: karolherbst: No, we keep bools as b1 I think
17:45 jenatali: jekstrand: Is that the output from vtn?
17:46 jekstrand: That's quite a bit later in the compile
17:46 jekstrand: But nothing should be moving stuff *to* shared
17:47 karolherbst: jekstrand: you know what would be fun? if llvm loop unrolls or something :p
17:47 jekstrand: The CL source definitely uses __constant
17:47 jenatali: We definitely get constant (ubo) variables on our end
17:48 jekstrand:is very confused
17:48 jenatali: What's the nir look like right after vtn?
17:48 karolherbst: yeah...
17:48 jekstrand:looks
17:49 jekstrand: It's coming out shared
17:49 jenatali: O.o
17:49 jekstrand:looks at the LLVM
17:50 jekstrand: @outValues = addrspace(2) constant [16 x i32] [i32 17, i32 1, i32 11, i32 12, i32 1955, i32 11, i32 5, i32 1985, i32 113, i32 1, i32 24, i32 1984, i32 7, i32 23, i32 1979, i32 97], align 4
17:50 jekstrand: I don't know what addrspace(2) is
17:50 jenatali: 2 == constant
17:50 jenatali: https://www.khronos.org/registry/SPIR/specs/spir_spec-1.2.pdf section 2.2
17:51 jenatali: jekstrand: What's the SPIRV say?
17:51 jekstrand: UniformConstant
17:51 jenatali: Time to debug vtn then... but I don't see how that could be happening
17:51 jekstrand: yup
18:03 karolherbst: jekstrand: maybe we should get this in before adding unordered ops? https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/4be8f5fa4c5ca6da8945f7ec3a83ca199a4e25a1
18:05 jenatali: karolherbst: What about the rest of the comparisons?
18:05 karolherbst: the are ordered
18:05 karolherbst: by definition in C != is unordered for floats
18:05 karolherbst: everything else is ordered
18:06 karolherbst: (and in glsl and everywhere else)
18:06 karolherbst: that's just so that (a == b) != (a != b) is true
18:06 karolherbst: because if != would be ordered that would be false for some values
18:07 jenatali: Uh, we map fne to LLVM's ONE and haven't had problems
18:07 karolherbst: bcause normally we don't care :p
18:07 karolherbst: but yeah.. mhh
18:08 jenatali: You sure the nir's fne is supposed to be unordered as-is?
18:08 karolherbst: jenatali: the thing is both ordered and unordered ne end up as fne
18:08 karolherbst: so.. it might just be that the test suck
18:08 karolherbst: yes
18:08 jenatali: karolherbst: SPIR-V unordered compare has more instructions added on top of just the fne
18:08 jenatali: Er, one of them does
18:09 karolherbst: jenatali: most of the opts assume fne is actually fneu..
18:09 karolherbst: except I missed one?
18:09 karolherbst: the inot + fne/feq would be wrong if fne is ordered
18:09 jenatali: Huh, UnordNotEqual and OrdNotEqual both get extra instructions
18:09 jenatali: karolherbst: Did you see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358 ?
18:11 karolherbst: nope
18:11 karolherbst: but all of that is quite messy anyway
18:11 karolherbst: but by definition fne should be fneu by default
18:12 karolherbst: I do emit fne as NEU on nv and that worked perfectly fine
18:12 jenatali: Interesting
18:13 karolherbst: yeah.. it all starts to make sense if you treat nir_op_fne as NEU :p
18:15 jenatali: Hm... that might actually a bug in our backend then - looks like WARP doesn't actually distinguish between ONE and UNE, but I did see some failures on Intel dealing with NaNs...
18:16 karolherbst: might be
18:16 jenatali: Thanks, something to try :)
18:16 karolherbst: but we also never had a strict concept on that anyway
18:16 karolherbst: just that languages treat != as unordered :)
18:16 karolherbst: because that actually makes sense
18:17 karolherbst: the point is, it's confusing everybody :p
18:18 jenatali: Agreed
18:25 airlied: mripard: I can make rc1 appera in tip
18:29 airlied: jekstrand: piglit also has some decent CL tests
18:30 airlied: kisak: read the valluum commit, it can't work with hw
18:30 airlied: nothing can make the hw magically vulkanable
18:31 ajax: "vulkapable", surely
18:32 jenatali: Is vulkanable the opposite of vulkan't?
18:33 jekstrand: I think this one's my fault. :)
18:33 jenatali: jekstrand: I'm curious, what'd you do? :P
18:34 giulianob: If I have two test files that needs to be linked together; must I write an special .exp file for them or is there is another way of doing that?
18:34 jekstrand: jenatali: My handling of vtn_variable_mode_cross_workgroup was bogus
18:36 jenatali: jekstrand: Oh, yeah, I see it now :P
18:38 jekstrand: It looks like the UBO path is basically required for constant right now. Stuff doesn't seem to work without it
18:38 jekstrand: That said, I think I'd like to make it work properly with global
18:38 jekstrand: And maybe add global_constant
18:38 kisak: airlied: fair enough, thanks for making that clear. I expected it would have been a monstrosity after excessive effort.
18:38 jenatali: Makes sense
18:39 airlied: kisak: if you want a non functioning vulkan driver for r600, you'd just port radv
18:39 airlied: since apart from the compiler, the hw model isn't drastically different
18:40 airlied: but it still wouldn't be vulkan or that useful
18:40 jenatali: karolherbst: Thanks for that, switching to UNE instead of ONE actually did fix some hardware failures :)
18:40 karolherbst: :)
18:40 airlied: and the kernel driver would also need major work
18:40 jekstrand: jenatali: Wait... Doesn't CL require that __constant things can't be cast?
18:40 jenatali: jekstrand: Uh, I think just that they can't change address spaces
18:41 karolherbst: jekstrand: you can cast everything
18:41 jekstrand: Right, that's what I meant
18:41 jekstrand: But it has to stay __constant
18:41 karolherbst: no, why?
18:41 karolherbst: conversion != cast
18:41 karolherbst: you can always cast, the value you get is just bogus
18:41 jekstrand: I'm just thinking about how we want to do __constant
18:41 jenatali: "If a pointer into one address space is converted to a pointer into another address space, then unless the original pointer is a null pointer or the location referred to by the original pointer is within the second address space, the behavior is undefined."
18:42 jekstrand: k
18:42 jekstrand: karolherbst: How does clover handle __constant from an API POV? Does it scrape the initializers out of the shader?
18:42 jenatali: That's what we do
18:42 karolherbst: jekstrand: we treat it as global memory for now :/
18:42 karolherbst: but yeah.. that was the plan
18:43 jekstrand: karolherbst: Oh....
18:43 karolherbst: move indirect accessed stuff into nir_shader.scratch and push it as an ubo?
18:43 karolherbst: dunno yet
18:43 karolherbst: it all sucks a bit
18:43 jekstrand: karolherbst: How on earth is that working? It doesn't work at all here. You must have more patches somewhere.
18:43 karolherbst: it doesn't :p
18:43 karolherbst: it works for enough stuff though
18:43 karolherbst: like passing values as constant into the kernel
18:43 karolherbst: that's fine
18:43 karolherbst: but if you start doing indirects and so on.. not so much
18:43 jekstrand: Ok then....
18:44 karolherbst: I have a patch turning it into ubos
18:44 karolherbst: but then in kernel initializers are still broken :)
18:44 jekstrand: My inclination is to put anything that has complete derefs into nir_shader::constant_data
18:44 karolherbst: jekstrand: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/0d0dba34986c1461b4d5d695207f2bb12dec62dd
18:44 karolherbst: jekstrand: yeah.. and we could turn that into a real ubo then
18:44 jekstrand: Or maybe everything that's declared as a variable
18:44 karolherbst: or at least treat it as an kernel input and go from there
18:44 jekstrand: Stuff passed in as parameters would be trickier
18:44 karolherbst: or something
18:45 jenatali: jekstrand: That's pretty much exactly what we do, yeah
18:45 karolherbst: jekstrand: for fun reasons, stuff you push a constant data through kernel args are inside private memory :p
18:45 karolherbst: like passed by value
18:46 karolherbst: but that's all fine
18:46 karolherbst: that's handled by the kernel wrapper
18:46 jenatali: Anything with complete derefs gets promoted to shader_temp (which probably should be switched to constant_data), which we shove into DXIL's immediate constant buffer construct - everything else we convert to global and reflect out for the runtime to fill in buffers
18:46 karolherbst: but yeah
18:46 karolherbst: I'd like to use constant_data for that
18:49 jekstrand: karolherbst: Wait, what?
18:49 jekstrand: karolherbst: So anything that's __constant and is a parameter ends up inline somehow?
18:49 jenatali: The pointer ends up in private memory
18:49 karolherbst: jekstrand: no, I mean stuff you pass in by value
18:49 cwabbott: jenatali: I can confirm that nir_op_fne is supposed to be unordered... the nir comparisons should have the same ordered-ness as the usual C ones
18:49 karolherbst: like struct some_param
18:49 jenatali: It still points to constant
18:50 jekstrand: I thought you could pass a __constant pointer in as an input
18:50 cwabbott: there's some brokenness in vtn and an MR or two floating around to fix it
18:50 karolherbst: that as well :)
18:50 jenatali: cwabbott: Thanks, I confirmed as well, switching to unordered fixed some failures for me :)
18:50 jekstrand: jenatali: [un]ordered comparisons in NIR are generally a problem right now
18:50 karolherbst: yeah. but let's rename stuff first :p
18:50 jekstrand: karolherbst: How do __constant pointers in input parameters work?
18:51 karolherbst: jekstrand: mhhh.. the runtime fills them with whatever value and defines an ABI with the compiler to do whatever... but what the value is is completly undefined, except for alginment rules
18:51 karolherbst: so you can take an address, but it has to be aligned
18:52 jekstrand: karolherbst: So they can't come from arbitrary buffers?
18:52 karolherbst: but the address can be some internal format
18:52 karolherbst: jekstrand: well.... you can pass in SVM pointers...
18:52 jekstrand: Ok.... Global it is then!
18:52 jenatali: jekstrand: They can come from arbitrary buffers
18:52 karolherbst: :D
18:52 karolherbst: jekstrand: but can we please have a load_mem_constant thing or so?
18:52 jenatali: Any CL buffer can be used for either global or constant memory
18:52 karolherbst: we do want to be smarter than just doing normal loads
18:52 jekstrand: karolherbst: Yeah, I think that's what we want
18:53 karolherbst: hw usually has load ops with more aggressive caching etc...
18:53 jekstrand: karolherbst: I think what I'd like to do is have nir_var_mem_constant and lower that to constant_data with pointers that start from nir_intrinsic_constant_ptr_base
18:53 karolherbst: or you can always use L1 cache
18:53 karolherbst: yeah.. maybe
18:53 jekstrand: The driver can do with that whatever it wants
18:53 karolherbst: clover probably could just add another arg :p
18:53 karolherbst: but yeah
18:53 jekstrand: For us, I've got an MR that patches constant addresses directly into the shader.
18:54 karolherbst: I prefer to use global addressing, I just don't want to treat it as global memory
18:54 jenatali: jekstrand: How does one access constant_data?
18:54 karolherbst: driver defined
18:54 jekstrand: jenatali: Right now, there's a nir_intrinsic_load_constant which is an offset. The driver lowers that to something.
18:54 jenatali: I mean, what instructions load data from it?
18:55 jenatali: Got it, thanks
18:55 jenatali: Sounds like that'd work quite well for us as well
18:55 karolherbst: jekstrand: but I also kind of want to be able to treat constant as ubos as the API allows applications to tell: no SVM pointer
18:55 jekstrand: jenatali: For this, though, I think we'll need something a little different. It has to be able to handle both nir_shader::constant_data and stuff that gets passed in as pointers.
18:55 karolherbst: and if you compile against 1.2 it's obviously not SVM either
18:56 karolherbst: well.. maybe it will be
18:56 karolherbst: mhhh
18:56 karolherbst: it's annoying
18:56 karolherbst: but my hope is that ldg.constant is as good as ubo loads
18:56 karolherbst: or close enough
18:56 jekstrand: Right now, nir_shader::constant_data is built around just scraping constants out of the shader source.
18:56 jekstrand: Big constant arrays and stuff like that.
18:57 jenatali: jekstrand: What I'd expect is that there's a pass which promotes some vars to constant_data and converts load_deref => load_constant
18:57 jekstrand: OpenCL is a bit different because it doesn't distinguish well between shader-declared constants and pointers passed in.
18:57 jenatali: For everything leftover you'd run lower_explicit_io to produce a load_global_constant
18:57 jekstrand: jenatali: That pass already exists: nir_opt_large_constants
18:57 jekstrand: Well, it exists as an optimization right now.
18:57 jenatali: Oh huh... sounds like we should switch to using that instead of our hand-rolled one...
18:58 karolherbst: to reduce gprs I guess :p
18:58 jekstrand: We'd need to do something better longer-term though
18:58 jenatali: Hm?
18:58 karolherbst: for nouveau I wrote something like this once: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4945
18:59 karolherbst: but that solves a different issue
18:59 jenatali: jekstrand: You can have a __constant pointer that either points to inline constant data, or to a __constant buffer that was passed as a kernel arg
18:59 jekstrand: jenatali: Right now, that pass is focused towards finding cases where someone did vec4 colors[35] = { ... }; gl_FragColor = colors[gl_PrimitiveID]; or something dumb like that.
19:00 jekstrand: jenatali: So in order to handle __constant buffer data, we'll need to take a more nuanced approach
19:00 jekstrand: I think we still want to promote stuff to __constant when we find it
19:00 karolherbst: mhh yeah...
19:00 jenatali: Sure
19:00 jekstrand: Let me start typing on this and see where I get. :-)
19:00 jenatali: Cool
19:29 jenatali: Ah, the reason we use shader_temp rather than constant_data is because DXIL supports multiple global variables, so we map the app's globals into separate variables. Seems like we could just as easily use one though and just go down the same path as Clover
19:51 jekstrand: cmarcelo, karolherbst, jenatali: Can I get some eyes on this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6210
20:55 cmarcelo: jekstrand: ack
20:59 jekstrand: karolherbst, jenatali: Do we not have a CL size/align func somewhere?
20:59 jekstrand:has a feeling it's in jenatali's branch
20:59 karolherbst: it's in my MR as well
20:59 jenatali: Yeah, karolherbst has a pick of it
20:59 jekstrand: What MR do I need to review?
20:59 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6367
21:00 karolherbst: I can remove some nouveau bits before merging it in case I want more reviews, but most of the patches are fine :p
21:01 karolherbst: it has a lot of good stuff though you want to have :D
21:34 jekstrand:has the test with a constant working
21:34 jenatali: :)
21:35 karolherbst: nice
21:35 jekstrand: karolherbst: How much do you care about not regressing nouveau clover when we make changes?
21:35 jekstrand: karolherbst: Obviously, we want it working in the end. :-)
21:35 karolherbst: depends on how much work it is for me :p
21:35 karolherbst: but testing is also quite simple
21:35 karolherbst: so..
21:35 jekstrand: karolherbst: I mostly mean can we make changes that break everything in the middle as long as it works in the end?
21:35 karolherbst: just ping me if I should run tests on stuff :p
21:36 karolherbst: ahh
21:36 karolherbst: probably
21:36 karolherbst: it's not like it's different in GL :p
21:36 karolherbst: or do you mean breaking on purpose?
21:36 jekstrand: I mostly mean that sometimes it's easier to just re-build something than it is to try to make incremental changes.
21:37 jekstrand: If the driver is mostly working and useful for people, that's generally not a good thing to do.
21:37 karolherbst: ahh, sure.. but in case you have an MR which does break stuff, I can also write the patch to fix it up :p
21:37 karolherbst: but yeah
21:37 jekstrand: But if it's more of "here's a thing" then it's less worth the effort
21:37 karolherbst: at this point I doubt anybody uses Nouveau for CL
21:37 jekstrand: Yeah, I want it working in the end
21:37 jekstrand: It just may be easier to burn the house down sometimes.
21:38 karolherbst: yeah.. that's fine
21:38 jekstrand:is watching jenatali sweat
21:38 jenatali: :(
21:38 jenatali: But we're so close to being upstream and making you fix it :P
21:38 jekstrand: I don't think I'm going to break stuff for jenatali
21:38 karolherbst: but at this point I think if it works with iris it potentially works with nouveau as well already :p
21:38 jekstrand: At least I can try not to
21:38 karolherbst: so... meh
21:38 karolherbst: and I can always add more lowering
21:39 jekstrand: karolherbst: For instance, with the constant stuff. I'd kind of like to land nir_var_mem_constant and say everyone should just fix their stuff than add yet another option to spirv_to_nir.
21:39 jenatali: Yeah, based on how much we've been discussing design tradeoffs in here I'm not too concerned
21:39 jekstrand: KWIM?
21:39 karolherbst: jekstrand: anyway.. maybe we should merge !6367 already as it brings clover to a .. reasonable state?
21:40 jekstrand: karolherbst: I don't see anything in 6367 that makes anything worse
21:40 karolherbst: most of the patches there are kind of required to make nouveau usefull at all otherwise you trip up at random things
21:40 jekstrand: karolherbst: I think we need to take a serious look at how clover is dealing with nir_lower_io
21:40 karolherbst: well :D it makes shared and private work at least
21:40 jekstrand: hehe
21:40 karolherbst: sure
21:40 karolherbst: I have nothing against reworking everything
21:40 karolherbst: :p
21:41 jekstrand: karolherbst: How long does it take you to run the CTS on hardware?
21:41 karolherbst: uhm.. depends on what tests
21:41 karolherbst: but the normal stuff round about.. 3-4 hours?
21:41 jekstrand: With your CTS runner it seems to take me an hour or maybe two
21:41 jekstrand: Pass 754 Fails 259 Crashes 69 Timeouts 54
21:41 jekstrand: That was my last run
21:41 karolherbst: ohh, that's impressive
21:41 karolherbst: but the timeout stuff is annoying
21:41 karolherbst: soo. there are tests which can run like 20 minutes
21:41 karolherbst: but
21:42 karolherbst: we also have those api tests timing out and slowing everything
21:42 karolherbst: but yeah.. I planned to do a run over the night
21:42 jenatali: I ran the conversions test on Intel with CLOn12 and it ran for 27 hours, FYI
21:42 jekstrand: Yeah, I'm happy to let that crap timeout. :)
21:42 karolherbst: anyway.. feel free to make the reporting suck less.. but at least the output _is_ json and you could extract the stderr of each test
21:42 airlied: I did a piglit run on llvmpipe cl, [710/710] skip: 74, pass: 364, fail: 166, crash: 106
21:42 karolherbst: I just don't save it in a nice format and don't parse it :)
21:42 jenatali: But yeah if you're looking for sanity tests, you can run wimpy versions of most tests
21:43 airlied: was going to merge clc patches locally and try again
21:43 jekstrand: airlied: Is that piglit or the CL CTS?
21:43 airlied: jekstrand: piglit
21:43 airlied: it's got pretty good coverage for builtins and stuff so hit a lot of the libclcc paths
21:44 jekstrand: ah
21:49 jekstrand: Wow, CL has a Nop ocode?
21:50 karolherbst: how pointless
21:50 karolherbst: why would anybody do that..
21:50 jekstrand: that can't be right....
21:50 jekstrand: Oh, no, it's not that
21:50 jekstrand: It's Acos that's missing
21:51 jekstrand: Probably in someone's branch
21:51 jenatali:clears throat
21:51 jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035?commit_id=48fb82fe35ccf1b4928148fb798eed801ef1124d
21:51 jekstrand:clearly needs to just review jenatali MRs
21:51 jenatali: :)
21:53 karolherbst: we have int16/8 lowering in nir, no?
21:53 jenatali: nir_lower_bit_size, yeah
21:53 jenatali: Though we did have to add something on top of that, one sec
21:54 jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/compiler/dxil_nir_algebraic.py#L33
21:54 karolherbst: ohh callback based
21:54 karolherbst: nice
21:54 karolherbst: okay..
21:54 karolherbst: just cleaning my MR a little to only keep "safe" patches to merge it tomorrow
21:58 karolherbst: jekstrand: do you want to do the move to nir_lower_vars_to_explicit_types for everything? Or should I do it as part of the MR?
21:58 karolherbst: kind of prefered to merge as is and add more stuff in new MRs
21:58 karolherbst: curro_: any thoughts on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6367/diffs?commit_id=4df2ab6364c5693ebfbaa8bd04a145bc67757db2 ?
21:59 karolherbst: I know it's not optimal and the backemds (spirv, llvm) should state alignment of the data of the arguments, but this at least makes it spec conformant :)
22:03 jekstrand: karolherbst: I want to see things move over sooner rather than later. I don't care if it happens before or after that MR
22:06 karolherbst: okay
22:06 karolherbst: well, I started to move over there anyway :p
22:07 karolherbst: or at least make shared/private do the right thing
22:09 karolherbst: anyway.. at this point I am more in this "make it work at all" state and optimize later :)
22:13 jekstrand: jenatali: Are you calling nir_lower_vars_to_explicit_types for everything?
22:13 jenatali: jekstrand: No, just shared/function_temp
22:13 jekstrand: 😭
22:13 jekstrand: Wait, do we need bbrezillon's packed struct stuff for vars_to_explicit_types to work?
22:14 jenatali: If you want packed structs to work
22:14 jenatali: I don't think we need it for non-packed
22:14 karolherbst: but essentially we can have packed sutff everwhere :p
22:14 karolherbst: _but_
22:15 karolherbst: no driver would be able to handle it anyway
22:15 karolherbst: so.. meh
22:15 karolherbst: :D
22:15 karolherbst: we really should get alignment figured out before for real
22:19 jekstrand: karolherbst: I suspect we want an explicit_align member for structs
22:20 karolherbst: maybe
22:20 jekstrand: karolherbst: So packed just means that you don't space out members?
22:21 jekstrand: karolherbst: What happens if you have a non-packed struct in a packed struct?
22:21 karolherbst: not directly
22:21 karolherbst: but yeah
22:21 karolherbst: it's more that each member has an alignment of 1
22:21 karolherbst: jekstrand: ehhh.. "C rules apply"
22:22 karolherbst: but I think the inner struct is not aligned, but still has a size of a non packed one?
22:22 karolherbst: dunno
22:22 karolherbst: the translator does the right thing so we won't have to care about any of this
22:22 jenatali: That's how I'd interpret it
22:22 karolherbst: ...
22:22 karolherbst: anyway, the property doesn't get inhereted
22:23 karolherbst: *inherited
22:23 karolherbst: actually..
22:23 karolherbst: the struct is aligned to one and no padding.. right, that's what spir-v says
22:24 karolherbst: but in case of optimizations we might want to be a little smarter? dunno
22:24 jenatali: LLVM sets the right alignment on accesses to packed struct members
22:24 karolherbst: packed structs are evil
22:24 jenatali: So once you have the patches in bbrezillon's series that propagate alignment, you can use that to inform what optimizations are legal
22:24 karolherbst: I don't like them at all
22:24 jenatali: Agreed, hopefully nobody ever uses them
22:25 karolherbst: jenatali: guess what SVM allows you to do :p
22:25 jenatali: Stuping things :P
22:25 jenatali: Stupid*
22:25 karolherbst: yeah..
22:25 karolherbst: oh well
22:25 karolherbst: it is a nice fature
22:25 karolherbst: *feature
22:25 karolherbst: it just allows so much that.. well.. people can make it slow
22:36 airlied: jenatali: the libclc MR needs a fix on rebase I tuink
22:36 jenatali: airlied: What's up?
22:37 airlied: jenatali: the clover code still calls nir_lower_constant_initialisers
22:37 airlied: instead of whatever that is now
22:37 jenatali: Got it
22:39 jenatali: jekstrand: Your comments made me realize I'm missing a patch or two to support the async copy functions, which are (stupidly) core SPIR-V opcodes rather than CL extension opcodes
22:44 jekstrand: airlied: Sorry, I just rewrote your CLC inlining pass in a comment on the MR. :-P
22:45 karolherbst: in case somebody wants to review something more "trivial": https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377
22:56 airlied: clinfo: ../src/compiler/spirv/spirv_to_nir.c:4402: vtn_handle_preamble_instruction: Assertion `nir_address_format_bit_size(b->options->global_addr_format) == 64' failed.
22:59 Lyude: seanpaul: btw - think you might be willing to look at some of the i915 patches in https://patchwork.freedesktop.org/series/80540/ ?
22:59 karolherbst: airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6367
22:59 karolherbst: you want to have that
22:59 karolherbst: huh wait...
23:00 jenatali: airlied: Sounds like you're compiling spir64 for a 32bit device?
23:00 karolherbst: airlied: how are you even hitting that?
23:00 jenatali: Or maybe you're consuming SPIR-V directly?
23:00 airlied: that's just running clinfo with the libclc patches on master
23:00 karolherbst: uhhh
23:00 airlied: against llvmpipe
23:00 karolherbst: what MR?
23:01 airlied: 6035
23:01 airlied: I think the clover stuff in there needs an updte
23:01 airlied: might have to move option setup to common code
23:02 karolherbst: ohh.. maybne
23:02 karolherbst: yeah..
23:02 karolherbst: we should run the same passes :p
23:02 karolherbst: I guess
23:02 karolherbst: but mhhh
23:02 karolherbst: oh well
23:03 airlied: clinfo: ../src/compiler/spirv/spirv_to_nir.c:4407: vtn_handle_preamble_instruction: Assertion `nir_address_format_bit_size(b->options->ubo_addr_format) == 64' failed.
23:03 karolherbst: actually...
23:03 airlied: which is different
23:03 airlied: and maybe trickier
23:03 airlied: since we currently use 32-bitindex/offset
23:03 karolherbst: constant_as_global = true :p
23:03 karolherbst: mhhh
23:03 karolherbst: I think I really need to take a look at that
23:03 karolherbst: there are... complications making it all super annoying
23:07 airlied: hmm the clc branch explodes a fair bit vs clover
23:08 airlied: https://paste.centos.org/view/e2308756
23:08 karolherbst: ehhh
23:08 karolherbst: is the spirv even valid? :D
23:09 karolherbst: airlied: oh well.. I can probably try to take a look, but I am sure once I wake up tomorrow you already fixed it :p
23:09 jekstrand: airlied: You're running into problems with __constant
23:10 jekstrand: airlied: I've been hacking on that today.
23:10 jekstrand: I'm starting to think that should happen sooner rather than later.
23:10 karolherbst: mhhhh...
23:10 karolherbst: yeah
23:11 karolherbst: jekstrand: the painful part is just, that we rather have to wait with all lowering until we linked the shaders together :/
23:11 jekstrand: karolherbst: ??
23:11 airlied: the spirv has one validation errror
23:11 airlied: error: line 50934: Invalid use of function result id 6296[%__clc_fp16_subnormals_supported].
23:11 airlied: OpGroupDecorate %89001 %__clc_fp16_subnormals_supported %__clc_fp32_subnormals_supported %__clc_fp64_subnormals_supported %x_109 %n_5 %_Z6rotatehh %x_121 %n_17 %_Z6rotatett %x_152 %y_68 %z_15 %_Z7mad_sathhh %x_154 %y_70 %z_17 %_Z7mad_satttt
23:11 karolherbst: like.. if we want to move constant variables into the constant_buffer, we would like to dce stuff first
23:12 jekstrand: airlied: Yeah, your problem is entirely spirv_to_nir being completely bogus for __constant
23:12 karolherbst: and that :p
23:12 jekstrand: karolherbst: Sure
23:12 airlied: jenatali: do you see that spirv val error on the the libclc lib
23:13 karolherbst: airlied: I have to say.. we are probably the only ones dealing with such huges spirvs :D
23:13 airlied: karolherbst: did we hack around this in the translator before, since this path used to at least load the library
23:13 jenatali: airlied: Honestly haven't run the validator on it before
23:14 jekstrand: airlied: It probably works ok if you set the UBO path for constants like jenatali is doing
23:14 karolherbst: ohh, right.. wait
23:15 airlied: jenatali: https://paste.centos.org/view/raw/c7eabfa1 is my current diff to your libclc MR merged into master
23:15 karolherbst: airlied: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/0d0dba34986c1461b4d5d695207f2bb12dec62dd
23:15 airlied: jekstrand: indeed since I used that path previously
23:15 karolherbst: it might work with that patch
23:15 karolherbst: wel...
23:15 karolherbst: with the nir/invocation.cpp bits
23:16 jenatali: airlied: Yeah, I think the problem is that it needs to use the pack64 address mode - I had lots of problems trying to get vtn to accept vec2 addressing modes for CL
23:17 jenatali: Crazy things like ulong->void* conversions blow up when the void* is a vec2
23:17 karolherbst: weird stuff :p
23:17 karolherbst: jenatali: fun fact, you can have the same shit in GL :p
23:17 jenatali: Oof...
23:18 karolherbst: but I think you need to turn on a couple of NV extensions to get that implicitly
23:18 karolherbst: bindless_textures and then more crap
23:18 airlied:will await further updates then :-P
23:18 karolherbst: soo bindless_textures had the great idea of making the texture handle a vec2 thingy :D
23:18 karolherbst: or well
23:18 karolherbst: implicitly convertable at least
23:19 karolherbst: but also you can cast vec2 to a sampler
23:19 imirkin: explicitly convertible.
23:19 karolherbst: implicitly with the NV extensions
23:19 imirkin: (minor point, but i'm pedantic like that)
23:19 jenatali: airlied: Yeah, I need to actually get a setup to try out clover... I've been mainly making changes to the patch series against CLOn12 in our downstream fork and then picking them back over... but obviously that doesn't work for the Clover patches
23:19 airlied: jenatali: yeah I'd like to get llvmpipe into CI at some point
23:19 airlied: just not close enough yet to be really useful
23:19 imirkin: karolherbst: which ext allows implicit conversions between uvec2 and sampler*?
23:20 imirkin: karolherbst: how would that even work -- texture() needs a sampler type.
23:20 karolherbst: imirkin: I am sure nvidia just accepts that with NV_bindless_textures, no?
23:20 imirkin: you have to cast it explicitly to sampler2D or whatever, i.e. uvec2 foo = ...; texture(sampler2D(foo), ...)
23:20 jenatali: airlied: Do we have a plan for getting libclc into CI? Are we going to switch to LLVM 12? Or check in a spv for libclc? Or...
23:20 imirkin: or sampler2D bar = sampler2D(foo);
23:20 imirkin: you can't just say sampler2D bar = foo
23:21 airlied: jenatali: have to work out how to get spirv-llvm-translator in first :-P
23:21 jenatali: :P fair
23:21 airlied: but yeah then probably check libclc in
23:21 airlied: we likely need someone to package spirv-llvm-translator for debian against the various llvms at lleast
23:22 jekstrand: Just build LLVM in the container build. That won't slow things down that far. They're already building multiple kernels. :-P
23:22 imirkin: karolherbst: yeah, look at the "Conversion and Scalar Constructors" modification to section 5.4.1
23:22 imirkin: they use uint64_t instead of uvec2, but same diff
23:22 airlied: jekstrand: definitely a winning soln :-P
23:22 karolherbst: ahh
23:22 karolherbst: oh well...
23:22 jekstrand: airlied: It builds well enough against distro LLVM
23:22 jekstrand: I don't see why we can't build spirv-llvm-translator as part of the CI build.
23:22 karolherbst: at least the NV extenions was sane to use int64_t...
23:23 airlied: jekstrand: yeah it should be fine, if it works against distro LLVM as well
23:23 imirkin: karolherbst: no uint64_t in unextended GL
23:23 airlied: then we just need to include a prebutild spv lib
23:23 jekstrand: airlied: I'm using it against Fedora LLVM 10
23:23 karolherbst: I know :D
23:23 karolherbst: but vec2.. ufff
23:23 imirkin: uvec2
23:23 imirkin: ARB_bindless_texture didn't want to depend on ARB_gpu_shader_int64 i guess
23:23 airlied:would like to avoid having to rebuild libclc
23:23 karolherbst: anyway, I am still waiting until we pass all piglit tests :p
23:23 karolherbst: then I'll write more
23:23 karolherbst: :D
23:24 jenatali: airlied: libclc isn't too bad to build, only takes a minute
23:24 airlied: jenatali: it's source is in a very large github repo though
23:24 jenatali: True
23:24 karolherbst: airlied: cache it like we cache the ccache cache :p
23:26 kisak: considering it's a slow moving target, can gitlab CI build it as a separate container target and heavily recycle it in building the main set of containers?
23:27 kisak: ^llvm/libclc/new-spirv-goodies
23:28 airlied:is afraid of touching CI for new things, don't want to cost X.org 20K because I do something wrong :-p
23:29 bnieuwenhuizen: airlied: is it in the LLVM repo?
23:29 airlied: bnieuwenhuizen: yes
23:29 bnieuwenhuizen: I think that repo was used for a while during container builds
23:30 bnieuwenhuizen: though I think these days we're able to use a .deb :)
23:30 jekstrand: jenatali, karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379
23:30 jekstrand: That gives us the NIR and SPIR-V plumbing for doing __constant with a new variable mode.
23:30 karolherbst: noice
23:30 jenatali: Sweet
23:30 jekstrand: It doesn't quite have the plumbing for clover yet. I've got hacks that work for iris but I don't know what to do for nouveau.
23:30 karolherbst: jekstrand: constant_base_ptr is just for the constant table though, no?
23:31 jekstrand: karolherbst: Yes
23:31 karolherbst: jekstrand: ehh. treat it like global memory for nouveau :p
23:31 karolherbst: it should work the same
23:31 jekstrand: karolherbst: Sure. Lower it to global.
23:31 jekstrand: karolherbst: I plan to lower it to global as well
23:31 jenatali: Same
23:31 karolherbst: but I think I would actually do proper instructions
23:31 karolherbst: because.. that's not a lot of work
23:31 jekstrand: But we need some way to say "here's your table"
23:31 karolherbst: yeah.. easy
23:31 karolherbst: let clover lower it to a new constant arg
23:32 jekstrand: That works
23:32 jekstrand: I've also got code for iris to embed it in the shader itself.
23:32 karolherbst: with user_ptr pointing into the nir_shader? :D bummer we can't do that
23:32 karolherbst: :D
23:32 jekstrand: I don't think there's actual a difference at the end of the day.
23:32 karolherbst: yeah...
23:32 karolherbst: shouldn't matter
23:33 karolherbst: I'd just like to be able to at least optimize that one to ubos...
23:33 karolherbst: at least this one buffer :D
23:33 jekstrand: karolherbst: The way my iris MR works is that it sticks it at the end of the program after the 2-3 binaries. Then, once we upload it and have an actual address, I back-patch the shader to stick the address in there. No binding involved. :D
23:33 karolherbst: mhhhh
23:33 karolherbst: :D
23:33 jekstrand: But I would be 100% happy for clover to just stuff it in an input buffer for me
23:33 karolherbst: we can't do those stupid tricks :D
23:34 karolherbst: yeah..
23:34 karolherbst: I can play around with it a bit
23:34 karolherbst: I want to figure out adding more arugments for offset lowering as well
23:34 jekstrand: But I've got those patches working with iris
23:34 jekstrand: But iris already supports nir_shader::constant_data
23:35 jekstrand: Right now, it's a UBO
23:35 karolherbst: right.. and for us that wouldn't be all that painful.. it's just annoying as you have to bind/unbind buffers... :/
23:35 karolherbst: well
23:35 karolherbst: anyway.. I will figure something out I guess
23:35 jekstrand: Someone's got to bind it. Either clover has to figure out how to add arguments or you have to add them in your back-end
23:36 jekstrand: Or you do clever tricks with patching addresses into shaders. :D
23:36 karolherbst: I'd like to have clover deal with it tbh :p
23:36 jekstrand:needs to twist Kayden's arm so he reviews that MR
23:36 karolherbst: jekstrand: patching addresses inside shaders just doesn't give us anything
23:36 jekstrand: Yeah, you really want bindings for performance
23:37 jekstrand: I think having clover deal with this is a good plan
23:37 jekstrand: I'd also like to have it deal with inputs via cbuf0
23:37 jekstrand: For that matter, It could put both in the same cbuf
23:37 karolherbst: yep
23:37 karolherbst: ohh wait
23:37 karolherbst: you can't :D
23:38 jekstrand: Why not?
23:38 jekstrand: Just put a pointer to cbuf0 in cbuf0
23:38 karolherbst: jekstrand: in shader constants are allowed to take as much space as a full constant buffer :)
23:38 jekstrand: They actually have a rule for that? Neat
23:38 karolherbst: I think so..
23:38 karolherbst: at least that's what nvidia uses to fail on
23:38 karolherbst: :D
23:38 jenatali: Yeah, I think that's right
23:38 karolherbst: we already tried that :p
23:39 jekstrand: Ok, cbuf0 and cbuf1 then
23:39 karolherbst: yeah...
23:39 karolherbst:sees another cbuf going away
23:39 karolherbst: that stuff is not fun if you only have 5 cbs
23:39 jekstrand: karolherbst: You were going to loose it one way or another
23:39 karolherbst: uhm.. 8, but 3 used for random crap
23:39 karolherbst: I guess
23:39 karolherbst: :D
23:39 karolherbst: anyway
23:40 karolherbst: we don't really plan on using cbs for the constant arguments, soo...
23:40 karolherbst: whatever
23:40 jekstrand: karolherbst: And, as long as you're lowering ALL local variables to scratch, why do you care about cbuf perf? :-P
23:40 karolherbst: I don't lower _ALL_ to scratch :p
23:40 jekstrand: karolherbst: Yes, yes you do.
23:40 jekstrand: Unless I'm totally misreading the clover code
23:41 jekstrand: I guess you run one copy_prop_varas
23:41 jekstrand: Oh, and one vars_to_ssa
23:41 karolherbst: :D
23:41 karolherbst: :p
23:41 jekstrand: So only a lot of them
23:41 karolherbst: only indirects I hope
23:41 jekstrand: Oh, it's going to be a lot more than indirects
23:41 karolherbst: but yeah..
23:42 karolherbst: if somebody wants to make it not suck, please go ahead :p
23:42 jekstrand: Which is to say, vars_to_ssa will pick up a lot of the obvious ones but the less obvious ones will need more optimization passes to run before we can really SSA or copy-prop them.
23:42 karolherbst: yeah.. I guess
23:43 jekstrand: Is there a way for iris to just ask clover to leave derefs be?
23:43 karolherbst: I never spend time tuning nir passes, so I want to leave it to somebody knowing that stuff
23:43 jekstrand: I want explicit types but not nir_lower_explicit_io
23:43 karolherbst: mhhhh
23:43 karolherbst: CAPS?
23:43 jekstrand: Ugh
23:44 karolherbst: the same way we do it for st/mesa :p
23:44 jekstrand: Or I could just ask you nicely to move those 5 lines to nouveau
23:44 karolherbst: I don't want to call this finalize_nir stuff.. because.. you know...
23:44 karolherbst: ...
23:44 karolherbst: *big sigh*
23:45 karolherbst: jekstrand: yeah.. I guess I have those in nouveau already anyway
23:45 karolherbst: at least some
23:45 jekstrand: I'm really not sure what the best optoin is at this point.
23:45 karolherbst: me neither
23:45 jekstrand: I used to have a strong opinion that drivers should just do whatever they need.
23:45 karolherbst: at this point I don't care
23:45 jekstrand: But then people decided that gallium should do it
23:45 karolherbst: and once perf gets important I might
23:45 jekstrand: except that does nothing for i965 or any vulkan drivers
23:45 jekstrand: And I'm very reluctant to add yet more stuff to nir_compiler_options and have a core NIR thing.
23:45 karolherbst: PIPE_CAP_LEAVE_MY_NIR_ALONE :p
23:46 jekstrand: srsly
23:46 imirkin: the point is that various managed resources have to line up with how they're referenced in the shaders.
23:46 jekstrand: Yeah, there's definitely some lowering that gallium needs to do
23:46 imirkin: and additionally you have stupid things like draw/etc shaders which are meant to apply on top of existing things
23:47 jekstrand: But does it need to have its own optimization loop?
23:47 karolherbst: yes
23:47 karolherbst: for stupid reasons :p
23:47 imirkin: so it has to know how to modify existing things
23:47 karolherbst: like you want to turn indirects into directs through optimizations
23:47 imirkin: i don't know about an opt loop - that seems silly
23:47 karolherbst: or in some places you have to know the constant value
23:47 karolherbst: I had to deal with some of that inside nouveau where gallium wasn't that good at this stuff yet
23:47 jekstrand: I'm not saying you shouldn't have an opt loop
23:47 jekstrand: I'm questioning whether gallium should have one
23:48 karolherbst: it needs it for lowering
23:48 jekstrand: Those are different questions
23:48 jekstrand: It might need copy-prop and constant folding
23:48 karolherbst: yes
23:48 jekstrand: But it doesn't need the works
23:48 imirkin: there were various PIPE_CAP_SHADER_CAN_DO_X for tgsi, since the implication was that the opts should be done PRIOR to tgsi, i.e. glsl ir/etc. but if it's all in nir, not much point...
23:48 karolherbst: ohh sure.. not saying it needs everything, but yes
23:48 karolherbst: it seems some stuff
23:48 karolherbst: *needs
23:49 jekstrand: karolherbst: In any case, feel free to copy+paste brw_nir_optimize into nouveau and call it good
23:49 karolherbst: at this point I just leave it alone until perf becomes relevant :p
23:52 jekstrand: That's fair, I suppose.
23:52 jekstrand: In any case, I think getting constants sorted out seems like one of the next things.
23:52 karolherbst: I really would like to do some better optimization, but.. uff.. that all feels quite pointless at the point when you can't even reach 5% of the blobs perf
23:52 karolherbst: yeah
23:53 karolherbst: I have offsets and printf on my list
23:53 karolherbst: and offsets will also deal with added parameters
23:53 jekstrand: Yeah
23:54 karolherbst: printf as well now that I think of it
23:54 karolherbst: we need to inject the printf buffer address
23:54 jekstrand:doesn't want to think about printf
23:54 karolherbst: ehh.. not that bad :p
23:54 jekstrand: Yeah
23:54 jekstrand: It's just annoying
23:54 jekstrand: and kind-of terrible that it exists
23:55 jenatali: Indeed
23:55 karolherbst: I think OpenCL happens if you only think about developers :p
23:55 karolherbst: there is really soo much stuff in it which could have been "this awesome library on top of CL you can link against" but no, CL had to include everything :p
23:55 jekstrand: jenatali: It'd be good if you could give at least a "first impression" of my nir_var_mem_constant MR. I don't want to rip the rug out from under you but I also want to get that little mess cleaned up before it gets any worse.
23:56 airlied: maybe we can create something from lvl0 that is cross vendor and doesn't suck as much
23:56 jenatali: jekstrand: Looks more or less fine
23:56 karolherbst: airlied: like CL on top of lvl0 :p
23:56 jekstrand: Or CL on top of Vulkan
23:56 airlied: karolherbst: CL on top of lvl0 would at least abstract the insane stuff a bit more
23:56 jenatali: Or CL on top of D3D12 :P
23:56 karolherbst: airlied: yeah.. my point :p
23:57 airlied: the problem moving forward layering CL on top of graphics is SVM and synchronisation
23:57 karolherbst: and 64 bit
23:57 karolherbst: and... :p
23:57 karolherbst: multi device also a huge problem
23:57 jenatali: Indeed
23:57 airlied: lvl0 isn't multidevice
23:57 jenatali: CL is
23:57 airlied: yeah so still easy to hide all that insanity in the CL abstraction somehow
23:57 karolherbst: CL multidevice is quite massive actually...
23:57 karolherbst: that's so insane
23:58 jenatali: Yeah... not "easy"
23:58 jenatali: Doable, yes
23:58 karolherbst: airlied: CL multi device allows you devices from different drivers :p
23:58 airlied: the bigger problem at least on Linux is that compute APIs are allowing apps to do insane thing
23:58 jenatali: karolherbst: Not necessarily
23:58 karolherbst: well.. you have to deal with platforms, yes
23:58 karolherbst: but...
23:58 airlied: and graphics APIs rightfully refuse those
23:58 karolherbst: you can still do it
23:58 jenatali: karolherbst: If you force Clover to use different platforms for different drivers, you can't use a single context with multiple platforms
23:59 karolherbst: a CL context doesn't really mean much anyway
23:59 airlied: when I started looking at lvl0 on top of radv, I realised it would sit on kfd better, but I don't want to sit on kfd
23:59 karolherbst: because of multi device
23:59 karolherbst: I think buffer sharing and that's it?