05:56 imirkin: i've been having this problem for a while, where chrome takes up 180% of cpu time for basically nothing in its "browser" thread. has anyone debugged anything like that before?
05:57 imirkin: no idea if it's mesa/X/rendering related or not
06:01 imirkin: hm, wiped some stuff in .config/chromium, seems to be better? we'll see...
07:01 austriancoder: tomeu: I saw 76330374413ee8ab857aa35f249652867be8981e but no docs on how to setup "a caching MinIO instance local to the DUT" - could you add one somewhere under docs/ci/ ?
07:55 MrCooper: bl4ckb0ne: git grep backports .gitlab-ci*
08:14 emersion: vanfanel: it's the result of drmModeAtomicGetCursor
08:15 emersion: basically, an atomic req is a list of obj+prop+value, you can append items to teh list with AddProperty
08:15 emersion: if you want to rollback some changes, you can "save" the state of the req with GetCursor, and "restore" it later with SetCursor
08:34 pq: vanfanel, I never believed that mixing legacy and atomic KMS API is a good idea. ;-) I don't know how the hotspot thing would interact.
08:38 pq: vanfanel, fences are used for ensuring correct order of operations: that a buffer is never simultaneously written and read, because that lead to reading incomplete drawing and hence garbage on screen.
09:06 tomeu: austriancoder: we don't have one yet, but when we add it, we'll also write docs about it
09:06 tomeu: austriancoder: in the meantime, maybe you can use nginx or any other reverse proxy? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4024
09:07 austriancoder: tomeu: I am not in a hurry :)
09:08 daniels: melissawen: great blog post & great work! :)
09:09 daniels: imirkin: if you have a tab open which is waiting for attention (most often due to authorisation dialog) then it will burn your CPU to the ground until you acknowledge it
09:11 melissawen: daniels, oh, thanks! :)
09:14 emersion: melissawen: can you give the link to the blog post?
09:15 daniels: emersion: it's on https://planet.freedesktop.org
09:16 emersion: eh, didn't know this existed
09:16 emersion: thx
09:16 emersion: direct link: https://melissawen.github.io/randomness/2020/08/12/end_of_endless.html
11:31 karolherbst: marge has quite a lot of MRs to deal with :D
13:20 marcodiego: I'd just seen the Vallium merge (https://cgit.freedesktop.org/mesa/mesa/commit/?id=b38879f8c5f57b7f1802e433e33181bdf5e72aef). Will it, in the future, be able to use hardware (OpenGL) features to support accelerated Vulkan on hardware that is not meant to support Vulkan?
13:21 marcodiego: I mean I have a non Vulkan capable SoC that has reasonably good OpenGL support. Can I hope that Vallium will allow me to run accelerated Vulkan on such hardware in the future?
13:24 bbrezillon: jekstrand, karolherbst: ok, so I've changed https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5900 to use explicit types instead of size_align() funcs. I kept the offset/stride calculation done in VTN, as this seems to be used by VTN itself
13:30 bnieuwenhuizen: marcodiego: not vulkan capable in the sense that the HW doesn't support it or the driver is not implemented?
13:30 bnieuwenhuizen: I think for the former it is going to be hard to get a conformant vulkan implmentation even with Vallium :)
13:32 marcodiego: bnieuwenhuizen, I mean when the hardware does not support it
14:03 degasus: marcodiego: if the HW does not support it, how shall vallium be able to get rid of this limitations. If the driver does not support it, it might be easier to implement Vulkan instead of the gallium interface.
14:05 imirkin: i think some people are under the mistaken assumption that software can somehow "fill in" for missing hardware functionality and work harmoniously with hw accel
14:05 imirkin: in case there's any confusion -- this is not the case
14:05 imirkin: there are highly limited circumstances where you can have "software" fallbacks, but these largely apply to very old hardware and APIs which was designed with such usage in mind.
14:08 degasus: marcodiego: just be clear, if the "reasonably good OpenGL" driver is not mesa based, Vallium will never run on top of it
14:09 marcodiego: imirkin, hmmm... elucidating comment. Basically I can assume that it is not feasible to implement Vulkan using a hybrid approach using OpenGL and software, is that?
14:09 imirkin: marcodiego: well firstly, you can assume that it's not possible to implement vulkan using opengl. forget software.
14:11 marcodiego: imirkin, what is so complicated about a software vulkan implementation?
14:11 imirkin: nothing
14:11 imirkin: you could totally use vallium to get vulkan on that SoC, assuming llvm runs on it
14:14 marcodiego: imirkin, so, what you mean by "forget software"?
14:14 bnieuwenhuizen: if you fill in too much with software your perf basically tanks
14:14 imirkin: i mean you can't implement vulkan with opengl
14:14 imirkin: opengl + software doesn't change that picture
14:14 imirkin: you can have a pure-software impl, which is what vallium is designed for
14:14 degasus: bnieuwenhuizen: no, that's the point. If you fill in *any* software, your perf tanks
14:15 imirkin: depending on the things filled in, your perf could go lower than just pure software :)
14:15 imirkin: it's not even worth thinking about.
14:16 marcodiego: hmmm... thanks, I think I understand now
14:24 tomeu: marcodiego: what's the GPU, btw?
14:25 marcodiego: tomeu, Mali T860
14:26 tomeu: marcodiego: oh, Arm has a vulkan driver for that one
14:26 tomeu: I heard they needed to do some tricks, though
14:27 marcodiego: tomeu, didn't know that. Can I hope for a vulkan driver for it with mesa?
14:30 tomeu: you certainly can hope for one, but I think there will be first one for the Bifrost GPUs, with Midgard coming afterwards
14:34 marcodiego: tomeu, cool! Is that planned? Anywhere I can track it?
14:35 tomeu: marcodiego: so far it's only conversations in #panfrost , some people have branches with experiments, but I don't remember where
14:36 marcodiego: tomeu, thanks! I'll take a look. Happy to know that
14:37 tomeu: yw
14:56 MrCooper: bnieuwenhuizen: "blit vs. pageflip is at the moment pretty much internal to the xserver" not really, XCB_PRESENT_COMPLETE_MODE_FLIP vs XCB_PRESENT_COMPLETE_MODE_COPY (vs XCB_PRESENT_COMPLETE_MODE_SUBOPTIMAL_COPY)
15:18 karolherbst: jekstrand: you have by any chance no patch to fix passes if encountering stores into vector components?
15:18 karolherbst: it's for this patch: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/52e99a672b165db5308b0b574dc5ae630ad566e7
15:19 karolherbst: I don't think we will find a better solution than fixing nir :)... sadly
15:19 jekstrand: karolherbst: We can drop that path and flip on the lowering pass
15:20 jekstrand: karolherbst: nir_lower_array_deref_of_vec
15:20 karolherbst: mhhh
15:21 karolherbst: ehh. what does it do?
15:21 jekstrand: It gets rid of array derefs on vectors
15:21 karolherbst: and replaces them with what?
15:21 jekstrand: if ladders
15:21 karolherbst: ....
15:22 karolherbst: I am not sure if that's worse or better
15:22 jekstrand: Its really the same code, the only difference is that it's based on NIR variable modes rather than is_block stuff inside VTN
15:22 jekstrand: So it will lower the right stuff and not the wrong stuff. :)
15:22 karolherbst: the issue is, we can't write the full vector
15:22 karolherbst: so any read+write thing is wrong
15:22 jekstrand: Yes
15:22 jenatali: jekstrand: We shouldn't need if ladders if we're running nir_lower_explicit_io
15:23 jekstrand: jenatali: Correct, sort-of
15:23 karolherbst: and if ladders are also annoying, aren't they?
15:23 jekstrand: You still need it for shader_temp and function_temp because lots of optimizations which break.
15:23 karolherbst: but I guess some drivers need them
15:23 karolherbst: jekstrand: I'd rather fix the passes
15:23 jenatali: jekstrand: Even if we're going to use lower_explicit_io on function_temp?
15:23 jekstrand: karolherbst: That's a lot of work. I've started in on it but oof....
15:23 karolherbst: I know...
15:24 karolherbst: do you have an idea with a similiar or better end result?
15:24 jekstrand: karolherbst: What I'm recommending is that we get rid of that bit of VTN code and then run lower_array_deref_of_vec for shader_temp and function_temp.
15:24 jekstrand: That way we get if-ladders there and nowhere else.
15:24 karolherbst: mhhhhhh
15:24 jekstrand: Oh, and manybe on input/output for 3D shaders
15:24 karolherbst: but how does it get around the reason we had to load the full vector?
15:24 jekstrand: (probably lave input alone for kernels)
15:24 karolherbst: or is that just because of broken apsses
15:24 karolherbst: *passes
15:24 jekstrand: I'm confused
15:25 jekstrand: My understanding was that the problem is that NIR is giving you load-modify-store for shared when it shouldn't
15:25 jenatali: Right
15:25 karolherbst: so the reason the code I removed is wrong is, that it _loads_ the vector, changes one component and writes the full one
15:25 jekstrand: load-modify-store is fine for shader/function_temp
15:25 karolherbst: it's not
15:25 jenatali: That logic would be fine for function_temp, honestly, and I'd rather have that than if-ladders
15:25 karolherbst: ehh
15:25 karolherbst: well.. I guess it's fine for temp indeed
15:26 jekstrand: It's not for anything that might be accessed cross-thread or cross-workgroup because it races.
15:26 jenatali: So... we should just update the modes where it's used so we don't use it for shared/global?
15:26 jekstrand: jenatali: Which logic would you rather have than if-ladders?
15:26 jenatali: jekstrand: Read-modify-write
15:26 karolherbst: why do we need if ladders at all?
15:26 jekstrand: jenatali: The if ladders will go away
15:26 jekstrand: jenatali: Don't get hung up on that. They'll be turned into bcsel
15:27 jekstrand: Assuming you're running opt_peephole_select
15:27 jenatali: Hm, fair
15:27 karolherbst: why would we want to have bcsel instead?
15:27 jekstrand: karolherbst: You'll have a load, some bcsel to replace the one component, and then a store.
15:27 karolherbst: I don't
15:27 jekstrand: And NIR will get rid of most of the extra garbage.
15:28 karolherbst: not all hw needs the load
15:28 karolherbst: I can just write one component directly
15:28 karolherbst: that's totally fine
15:28 jekstrand: karolherbst: To a register?
15:28 karolherbst: shared memory
15:28 jekstrand: karolherbst: I'm not talking about shared memory. Shared memory won't get any lowering.
15:28 jekstrand: karolherbst: I'm talking about shader/function_temp
15:28 karolherbst: right.. but I wrote the patch for shared
15:28 jekstrand: The point is to get rid of ANY lowering for shared and just write components.
15:28 karolherbst: right
15:28 karolherbst: okay
15:28 jekstrand: karolherbst: Right. You wrote the patch to get rid of the internal lowering for shared.
15:28 karolherbst: but...
15:29 jekstrand: But we can't get rid of it for shader/function_temp
15:29 karolherbst: for indirect temp, I won't need them either
15:29 karolherbst: and direct temps doesn't have the issue anyway
15:29 karolherbst: *don't
15:29 jekstrand: So we either need to make it more precise in VTN somehow or we need to run lower_array_deref_of_vec on shader/function_temp
15:29 jenatali: Can't we just make it more precise in VTN?
15:29 jekstrand: We could
15:30 jenatali: That seemed simple, last I looked at it
15:30 karolherbst: okay..
15:30 karolherbst: so we just change it for shared mem, fix the fallout and move on?
15:30 karolherbst: (for now)
15:30 jenatali: karolherbst: was global okay? or was global broken too?
15:30 karolherbst: global doesn't hit this path afaik
15:31 karolherbst: let me check
15:31 jenatali: The global *test* doesn't hit that path
15:31 jekstrand: You could do if src->mode == vtn_variable_mode_function || src->mode == vtn_variable_mode_private) { /* do what we do today */ } else { just _vtn_local_loac_store() }
15:31 jenatali: Just wasn't sure if the global mem type was okay or not
15:31 karolherbst: jekstrand: yeah.. I guess that makes sense
15:31 jekstrand: Oh, and we should do it for _input and _output as well
15:32 karolherbst: mhhhh
15:32 karolherbst: I have a better idea
15:32 jekstrand: ?
15:32 karolherbst: just make the fast path for mem_shared and everything else stays the same?
15:32 jekstrand: I'm pretty sure the wrong path is being used for more than mem_shared today
15:32 jenatali: Like global
15:32 karolherbst: mhhh
15:32 jenatali: IIRC
15:32 jekstrand: yup
15:32 karolherbst: maybe
15:33 karolherbst: ohh...
15:33 karolherbst: we call vtn_local_store when emiting vstore/vload
15:33 karolherbst: mhhh
15:33 karolherbst: that sounds wrong
15:33 jenatali: I think it doesn't matter, since we only emit single-component loads/stores
15:33 jenatali: We don't use array-deref-of-vector, so it should be fine
15:33 karolherbst: mhhh
15:34 jekstrand: Ugh... This whole thing looks wrong to me
15:34 jenatali: :D
15:34 karolherbst: wait a moment, it will vanish
15:35 jekstrand: Maybe all we need to do is add shared to is_external_block
15:35 karolherbst: yeah...
15:35 karolherbst: that what I was thinking now
15:35 jenatali: And global
15:35 karolherbst: and global
15:35 karolherbst: yeah.. let me try that
15:36 jenatali: That was what I was thinking last time I looked as well
15:36 jekstrand: Ugh... THat helper means way too many things
15:36 karolherbst: ehhh...
15:38 jekstrand: I'm really starting to think that the right answer is nir_lower_array_deref_of_vec
15:38 jekstrand: Sorting this out inside vtn isn't looking fun
15:38 karolherbst: but then I'd prefer having drivers call into that, but that also means fixing a bunch of passes again
15:39 karolherbst: like for nouveau we won't need any lowering here
15:39 karolherbst: or maybe we shouldn't use scratch for indirect arrays...
15:39 karolherbst: and indirect vectors I guess
15:39 jekstrand: karolherbst: You really don't want to fall back to scratch just for an indirect vector deref
15:40 karolherbst: yeah... that's probably right
15:40 jekstrand: I guarantee a few bcsel is cheaper than memory access
15:40 imirkin: it'd be an interesting benchmark
15:41 bnieuwenhuizen: constant memory maybe, but scratch?
15:41 karolherbst: bnieuwenhuizen: for writing?
15:42 imirkin: not something i ever thought about - i was just happy it worked :)
15:42 karolherbst: well..
15:42 karolherbst: we could cache aggressively
15:42 bnieuwenhuizen: karolherbst: or for doing indirects on data that wasn't in constant memory in the first place?
15:42 karolherbst: bnieuwenhuizen: it's about indirect component writes into vectors
15:42 bbrezillon: jekstrand, jenatali, karolherbst: déja vu :)
15:43 jenatali: bbrezillon: Indeed :)
15:43 bnieuwenhuizen: I think it is very easy to blast through the cache with even miniscule amounts of scratch
15:43 karolherbst: jekstrand: I think the issue is bigger
15:43 karolherbst: the code as it's today will do those load/write things even for direct accesses
15:43 bbrezillon: jekstrand: last time you suggested to get rid of that local_load/store and add a lowering pass
15:44 jekstrand: karolherbst: Yes, it will.
15:44 bbrezillon: at least that's how I understood it
15:44 jekstrand: bbrezillon: The lowering pass already exists. :)
15:44 bbrezillon: right
15:44 karolherbst: but for direct it's really not the thing we want to have (even if nir is somehow able to clean it all up)
15:44 bbrezillon: that's what you said last time :)
15:45 karolherbst: for indirect sure.. if ladder or whatever is fine
15:45 karolherbst: but not for direct
15:45 karolherbst: although I guess that gets optimized away
15:45 karolherbst: mhhhh
15:46 karolherbst: ehh wait
15:46 karolherbst: no, it doesn't
15:46 karolherbst: we still have this RMW
15:46 karolherbst: so even in the direct case, single component write we end up with a bunch of code nobody needs
15:46 bbrezillon: and yes, local_load/store is being misused in a few places (like when it's called from vload/vstore
15:46 bbrezillon: )
15:47 karolherbst: mhhh
15:47 jekstrand: karolherbst: For shader/function_temp, NIR will get rid of that code
15:47 karolherbst: ahh, makes sense
15:48 jekstrand: It'll emit a pile of crap and NIR will turn it into vec()
15:48 jekstrand: And we can tweak the pass if needed to emit less garbage in the constant-index case
15:53 bbrezillon: karolherbst: BTW, I had started implementing what I understood of jekstrand's suggestion => https://gitlab.freedesktop.org/bbrezillon/mesa/-/commits/ms/cl-vstore-local-fix/
15:54 jekstrand: bbrezillon: I've got a patch. I'm trying to poke at a more pseudo-long-term solution.
15:55 bbrezillon: jekstrand: I thought that was the long term solution I had started to work on :P
15:56 jekstrand: bbrezillon: One second. :)
15:56 bbrezillon: it definitely doesn't work
15:56 jekstrand: bbrezillon: None of that is back-portable which is part of the problem.
15:56 bbrezillon: hm, ok
15:56 jekstrand: And we've got a but in all the Vulkan drivers today with shared
16:00 jekstrand: bbrezillon: https://gitlab.freedesktop.org/jekstrand/mesa/-/commits/jenkins_vulkan
16:01 jekstrand: bbrezillon, karolherbst, jenatali ^^
16:01 jekstrand: I think that fixes it for now
16:01 jekstrand: But, dang, the logic in vtn needs to be cleaned up!
16:02 jenatali: Indeed - but I'd be fine with taking that for the short-term
16:02 jenatali: Or even medium-term
16:02 karolherbst: jekstrand: yeah.. looks good
16:02 jekstrand: This'll all be much easier to clean up once we can delete the non-deref paths from spirv_to_nir
16:02 bbrezillon: jekstrand: hm, is it fixing the case where vload/vstore issue?
16:02 jekstrand: bbrezillon: I'm not sure about vload/vstore
16:02 karolherbst: bbrezillon: it's not a vload/vstore issue
16:02 bbrezillon: no
16:03 jenatali: bbrezillon: There is no vload/vstore issue, because they do single-component accesses
16:03 jekstrand: What's the vload/vstore issue?
16:03 bbrezillon: it's vload/vstore calling the wrong function
16:03 karolherbst: but you mean the issue hitting with vload/vstore? yes :p
16:03 karolherbst: bbrezillon: it's not a vload/vstore issue :p
16:03 jekstrand: We have too many different functions. That's the vload/vstore issue. :)
16:03 karolherbst: yes, the test fail
16:03 jenatali: I agree, they should be calling a different function, but it's not currently causing a problem
16:03 karolherbst: but not because of vload/vstore
16:04 karolherbst: jekstrand: will try your patch here
16:05 karolherbst: jekstrand: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6330 needs poking agian
16:05 karolherbst: ehhh.. ci
16:06 karolherbst: seems like the GPU broke?
16:06 bbrezillon: ok, maybe it's me not understanding what the local stands for in the vtn_local_{load,store}() functions then
16:06 jekstrand:is getting really tired of panfrost CI blowing up and blocking marge
16:06 karolherbst:was thinking about wiring up some nouveau stuff blowing up in the kernel randomly :p
16:08 jekstrand: Whatever lava setup the panfrost CI uses; it's not reliable.
16:08 karolherbst: well... the GPU can crash when running tests
16:08 jekstrand: That's what reset is for
16:09 karolherbst: right
16:09 karolherbst: I guess it should be prepared for rebooting the entire machine
16:11 jekstrand: karolherbst: Also, restrict your test set to tests that don't do that. :)
16:12 karolherbst: :D
16:12 karolherbst: if that would be only that simple :p
16:14 jekstrand: karolherbst: If you're going to have issues with needing to reboot the whole machine more than 1 out of ever 25 CI runs, please don't put it in CI. Fix the driver first.
16:14 karolherbst: I should test post merge !6330.. I waited for that to land in order to be able to fix other stuff.. but let's see
16:14 karolherbst: jekstrand: not planning to :p
16:17 karolherbst: jekstrand: patch works for me
16:18 jekstrand: karolherbst: Cool. I'll post an MR once Intel CI comes back.
16:18 jekstrand: I don't expect any problems but that code is.... Not great.
16:29 jekstrand: Ugh... All this array-deref-of-vec stuff is going to have interesting implications for generic pointers. :-(
16:30 jenatali: Ooh, good point
16:30 jekstrand: It's handleable, I think. You just have to lower_array_deref_of_vec any time opt_deref makes progress.
16:31 jenatali: Hm, yeah - as long as other passes use equality instead of bitmask checks on deref modes
16:32 jenatali: But that's probably just a matter of making sure passes either support array-deref-of-vector, *or* only try to optimize derefs that have single modes, and fixing any that get it wrong
16:38 jekstrand: Right now, I'm changing nir_deref_instr::mode to modes and using it as an excuse to audit every use of deref modes in NIR.
16:40 jenatali: :)
16:40 jenatali: That's a change that won't be painful to rebase onto, no sir :P
16:40 jekstrand: :)
16:46 jekstrand: Lots of little decisions about when to use `deref->modes != mode` vs. `!(deref->modes & modes)` vs. `(deref->modes & ~modes)`
16:46 karolherbst: ehhh.. constant memory is annoying :/
16:56 jenatali: karolherbst: What's the problem?
16:57 karolherbst: I guess I lower it incorrectly
16:58 jenatali: Would someone be willing to Marge https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6309 ?
17:01 vanfanel: What would be the "right" way to get two planes (a PRIMARY and a CURSOR plane) to get updated independently? Until now, I was relying on a "central" atomic commit on the buffer swapping function of the PRIMARY plane, but I have seen that some programs don't do pageflips in certain situations where mouse cursor movement is still wanted.
17:05 daniels: jekstrand: on what grounds have you concluded that Panfrost is the cause of all CI issues?
17:06 daniels: jekstrand: over the past 7 days, we've had one (1) kernel lockup above, yesterday two pipelines failed due to a Sunday-morning infrastructure issue (immediately disabled and correctly so)
17:07 daniels: in the same timespan, we've had the same softpipe test flake three times, one virgl test flake twice (asked virgl people to add it to the flake list), two instances of mingw failing at running glcpp tests, and 14 freedreno infrastructure fails (takes an hour to fail at turning on)
17:09 daniels: I gave up counting how many times it failed because 'branch cannot be merged', but it felt like around 30-35; that's not something anyone should ever be seeing since I deployed a patched marge-bot this afternoon
17:09 daniels: but that's, what, 1/20th of the overall failures, on a bad week?
17:12 jekstrand: daniels: I don't know where you're getting those statistics.
17:12 daniels: jekstrand: from GitLab
17:13 jekstrand: My entirely subjective experience is that it's usually panfrost
17:13 daniels: well ... it's not :)
17:13 daniels: see for yourself - scroll through https://gitlab.freedesktop.org/users/marge-bot/activity and every time you see a comment balloon, that's a failure. click through and see what caused it.
17:15 jekstrand: daniels: I may be blaming the wrong bit of CI.
17:15 jekstrand: daniels: However, if you scroll through that list, the ratio of bubbles to merge icons is far too high.
17:17 karolherbst: jenatali: mhh.. somehow the ubo stuff is a bit weird... so for a 32 bit device, you want to either use 32bit index_offset, right?
17:17 karolherbst: ohhh... I know what I forgot
17:17 karolherbst: I need to fix up the wrapper
17:17 jekstrand: Unless people are prematurely assigning MRs to stuff with real failures and/or jumping the marge queue (in which case marge should rebase and try again), the ratio should be 1:10 or 1:20, not 1:2-3.
17:17 daniels: jekstrand: I know! that's why I'm trying to fucking fix it!
17:18 jenatali: karolherbst: Dunno, we gave up on 32bit support. I'd expect it to be just 32bit_global
17:18 jekstrand: daniels: Ok, then I'll let you fix it. :-)
17:18 karolherbst: yeah... let me get 64 bit to work first
17:18 jekstrand: daniels: Sorry for all my grumbling. I've just had a lot of trouble with marge lately.
17:19 daniels: jekstrand: help welcome, but tbh it's pretty demotivating to notice that apparently everyone thinks Panfrost is unreliable rubbish (I went back a few more days trying to find another Panfrost failure; found fdno/softpipe/virgl/glcpp/i965-tools/fdtools failures but no more), and also pretty annoying to notice the number of failures where people just repeatedly re-assign without even noting the failure let alone fixing
17:19 jekstrand: daniels: Both of those are fair comments.
17:20 daniels: (to your earlier points about Panfrost's actual CI setup: yes, we know about GPU reset and do it, this is the first time I've seen it fail; yes we know about blacklisting tests which is why we're aggressive about doing it; yes we know about monitoring failures which is why we do it and also why we have so few ...)
17:20 jekstrand: Subjectively, it feels like it's usually Panfrost for me but I've not done any statistics.
17:20 jekstrand: And that's based on a very scatter-shot group of MRs across a month or so
17:21 jekstrand: Actual statistics are better and far more useful for solving general CI instability.
17:22 jekstrand: Also, it's usually not the Panfrost driver; it's usually lava or someone failing to push results somewhere.
17:22 jekstrand: I've not seen the driver flake in a while.
17:22 jekstrand: (It could very much not be lava; I don't know how results collection actually works)
17:23 jekstrand: Anyway, enough of my unhelpful complaining.
17:24 daniels: here's the last Panfrost failure before today's GPU timeout + yesterday's infrastructure damage, which was on the 30th of July https://gitlab.freedesktop.org/anholt/mesa/-/jobs/3881058
17:26 jekstrand: Ok, point taken. Sorry, I'm really not trying to argue against Panfrost.
17:26 daniels: I know we had a _ton_ of failures around network/infrastructure issues for a while usually manifesting in silent timeouts or machines not booting; those have been fixed after a lot of rewiring and a lot more sharding for additional redundancy (though we still can't get a new network link physically installed because social distancing hooray)
17:26 daniels: the artifact upload thing is super annoying too, but we chased that upstream now and that doesn't happen anoy
17:26 daniels: *anymore
17:26 jekstrand: Ok, cool
17:27 karolherbst: jenatali: you are also handing in a variable index, aren't you?
17:27 daniels: if the last 3 weeks is representative, then >50% of our failures are gone as of this afternoon, which would make ~2/3rds of failures the freedreno never-powers-on thing; I'm not entirely sure who's signed up for fixing that
17:27 daniels: but yeah, if you do spot things, please do ping me, or file a new issue tagged CI, or ... something?
17:27 jenatali: karolherbst: Not sure what you're asking
17:27 karolherbst: jenatali: for constant kernel arguments
17:27 karolherbst: what data does the runtime provides and what is known at compile time
17:27 jekstrand: daniels: Cool. That sounds like it shoule help a lot.
17:28 daniels: that or just expose the next new bogeyman :P
17:28 jekstrand: daniels: I like marge. Really, I do! It used to be great. It just sounds like we've had a bit of a perfect storm.
17:28 jenatali: karolherbst: Ah. We end up converting all __constant => __global, so the runtime provides the full pointer to the constant region
17:28 jenatali: For kernel args, nothing's known at compile time
17:28 daniels: jekstrand: tbh it's more like summer neglect than a perfect storm, but yeah
17:28 karolherbst: jenatali: mhhhh....
17:29 jekstrand: daniels: So, a perfect storm caused by global warming because humans are idiots?
17:29 karolherbst: why does constant* have to be the most painful address space? :D
17:29 daniels: jekstrand: London's been awfully Texan this summer :\
17:31 jekstrand: daniels: 35-40C every day?
17:31 daniels: jekstrand: 6 consecutive days of >=34C, yeah
17:31 karolherbst: well, like the rest of europe as well :p
17:31 karolherbst: although I guess further north it wasn't _that_ bad
17:37 daniels: karolherbst: yes and no - the problem here isn't absolute temperature (hottest I had in Melbourne was 46.5C, and that was more tolerable than the last couple of weeks here), just that UK buildings are designed to trap as much heat as possible and never let it escape - that problem's kind of magnified in the nordics :P
17:38 imirkin: IME humidity is a giant factor in all this too... i've been in vegas with 40 degC+ temps, and it was fine outside, since it was so dry
17:39 daniels: yeah exactly - hot Melbourne is brutally hot but also bone-dry, whereas it's been more like Singapore here
17:39 imirkin: whereas the US east coast, with like 150% humidity, even 35C is death to me
17:39 jekstrand: Yeah, that hot in London is probably pretty brutal. Austin is massivly better than Houston (I've lived in both) because Houston is on the Gulf and Austin is 300 miles inland.
17:39 daniels: jekstrand: anyway there's my flimsy excuse for being so frustrated - sorry for taking it out on you, it's just depressing to see a lot of people mashing retry and just letting their own frustration build up over easily solved issues
17:40 jekstrand: daniels: Yeah. For us consumers of CI, they're not so easily solved. :-(
17:40 jekstrand: But, yeah, I feel it.
17:40 jekstrand: It's the classic, "I know, I'm working on it!" problem.
17:40 daniels: jekstrand: it's not magical though - it's super easy to reason about and get involved
17:51 daniels: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6352 blacklists flaky softpipe + virgl tests if anyone wants to ack
17:53 karolherbst: jenatali: how are you dealing with the constant buffer btw?
17:53 karolherbst: are there any tests testing that stuff?
17:53 jenatali: Testing "that stuff"?
17:53 karolherbst: this spilling of kernel constant
17:53 jenatali: Ah
17:54 jenatali: I'm not sure to be honest, we had it implemented with local functional tests long before we had the compiler + runtime hooked up well enough to start running the CTS
17:54 karolherbst: and I already see me adding nir_address_format_24bit_index_8bit_offset_pack32
17:55 jenatali: I do know that test_compiler hit a bug in our WARP software driver trying to access spilled constants in the preprocessor test
17:55 karolherbst: jenatali: I am mildly aware :D
17:55 jenatali: Trying to copy __FILE__
17:55 karolherbst: ehhh
17:55 karolherbst: more itnerested in not having to write my own test though
17:56 karolherbst: and I need to tell clover to emit 32/32_pack64 now instead of it's 24/8_pack32 thing... which well.. works as long as you only have one constant buffer
17:59 jenatali: karolherbst: Unfortunately I'm only really aware of which tests target specific features when they were painful for me :P we already had constants working before running the CTS, so I'm not really sure which tests would stress it
18:30 karolherbst: argh...
18:30 karolherbst: right..
18:30 karolherbst: clover doesn't do constant buffers...
18:30 karolherbst: I totally forgot
18:31 jenatali: Heh, that seems like a problem
18:31 karolherbst: well.. it tries to
18:31 karolherbst: but..
18:31 karolherbst: it binds a surface, not a constant buffer
18:31 karolherbst: :/
18:31 karolherbst: because....
18:31 karolherbst: reasons
18:31 jenatali: Huh
18:31 karolherbst: there was a time where gallium was less sane
18:31 karolherbst: and stuff just was made to work
18:32 karolherbst: curro_: any reasons we can't use the constant buffer stuff in clover?
18:33 karolherbst: meaning pipe->set_constant_buffer
18:33 sravn: mripard, mlankhorst can we get a backmerge of -rc1 to drm-misc-next? Need dev_err_probe() for a panel patchset
18:38 jenatali: karolherbst: Out of curiosity, where is image stuff (!5242) on your backlog? No rush I think, just curious
18:38 karolherbst: well.. below the other CL stuff I don't mind not supporting images for a while, but yeah.. I should try to get some things working..mhh
18:39 karolherbst: but there is also not much left
18:39 karolherbst: constant and offsets are probably the biggest things
18:39 karolherbst: so let's see how that goes
18:39 jenatali: Cool, sounds good
18:40 jenatali: Offsets (!5891) should be ready for you to integrate as well
18:42 karolherbst: yeah.. I am mainly wondering on how to support it from clover without messing around too much
18:42 jenatali: Makes sense
18:43 karolherbst: adding arguments is kind of messy, but maybe not _that_ terrible
19:18 jenatali: anholt: If you think it's good, would you be able to assign https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6309 to Marge?
19:19 anholt: done!
19:19 jenatali: Thanks :)
19:19 anholt: we need to give you commit access
19:19 jenatali: You wouldn't hear me complaining :P
19:29 karolherbst: soo.. constant working now :)
19:30 jenatali: \o/
19:30 karolherbst: but still.. all lowered to VRAM :D
19:30 jenatali: Baby steps :P
19:30 karolherbst: I guess I could hack up the wrapper a little mhhhhh
19:30 karolherbst: clover does index them _anyway_
19:30 karolherbst: so the first one is first, etc...
19:31 karolherbst: but it does seems like there is some offsetting going on
19:32 karolherbst: jenatali: there is even some magic we can do for 32 bit...
19:32 karolherbst: soo...
19:32 karolherbst: the hw allows overflows into the next buffer
19:32 jenatali: Huh
19:32 karolherbst: lower 20 bits are the offset
19:32 karolherbst: higher 12 bits are the index
19:32 karolherbst: so if you have c2[0x100204] this means c3[0x204]
19:32 karolherbst: or so
19:33 karolherbst: I guess you can't
19:33 karolherbst: but... I wouldn't be surprised if there is a way of doing that in dxil anyway?
19:33 karolherbst: but probably not
19:33 jenatali: Yeah no, DXIL CBs have a max size of 64KiB, no overflow
19:33 karolherbst: right
19:34 jenatali: Though like I said, we ended up switching constant memory to SSBO/UAV anyway just to improve driver stability when dealing with non-uniform access
19:34 karolherbst: maybe it was 16/16 bit for us?
19:34 karolherbst: imirkin: do you know?
19:34 karolherbst: jenatali: eh.... yeah.. that's pain
19:49 karolherbst: mhhh.. nir can't constant fold unpack_64_2x32_split_y in combination with and :/
19:50 karolherbst: or maybe I messed up
19:51 jenatali: It should be able to fold those both?
19:52 karolherbst: jenatali: https://gist.github.com/karolherbst/b446179e1a656712580fa38647875c5e
19:52 karolherbst: no matter how you look at it, ssa_45 is 1 :)
19:52 jenatali: Ah
19:52 karolherbst: but yeah..
19:52 karolherbst: maybe a bit too complex
19:52 jenatali: That's not constant folding, it's just optimizations
19:52 karolherbst: right...
19:53 karolherbst: maybe I should create the value differently
19:53 karolherbst: maybe use pack instead?
19:53 karolherbst: ... I guess
19:53 jenatali: Yeah, pack -> unpack should optimize out correctly
19:54 karolherbst: let's see
19:55 robclark: daniels: I'm seeing a lot more "CI failed!" than "CI is taking too long." (the latter would be the tftp boot fail on the freedreno runners).. maybe it was worse over the weekend (since it seems to happen more when the temp is high.. and it was pretty hot last couple days)
19:55 curro_: karolherbst: yeah it would be good to use set_constant_buffers instead of RO surfaces
19:55 karolherbst: curro_: okay :)
19:55 karolherbst: I already have it working, was mainly wondering what you think about it
19:55 robclark: that said, no idea why that is happening.. not that we have physical access to the CI farm in the first place..
19:56 karolherbst: curro_: side note: we might want to remove all unbinding in clover as gallium drivers are expected to just deal with it, but that's for alter I guess
19:56 karolherbst: just something I keep thinking about
19:57 karolherbst: jenatali: sucssess!
19:57 jenatali: :)
19:57 imirkin: karolherbst: was your question resolved?
19:57 imirkin: (was on a call, about to hop on another one...)
19:57 karolherbst: jenatali: simple fma example having three constant sourceS: https://gist.github.com/karolherbst/04dc1df7b181b13e79af5541db8a41ba
19:58 karolherbst: imirkin: more or less.. I was wondering how this ubo overflowing works
19:58 curro_: karolherbst: main reason i didn't bother to do that before is that it's largely equivalent to real CBs (like, the driver can implement them as constant buffers anyway), but kind of quirky yes
19:58 karolherbst: but I also don't plan on using it
19:58 imirkin: karolherbst: there are two LDC ops
19:58 karolherbst: curro_: yeah.. but it gets annoying if you support SVM as well
19:58 imirkin: or rather, one LDC op
19:58 imirkin: if you load out of bounds, you'll get 0
19:58 karolherbst: or drivers only support 6 ubos in computer shaders :)
19:58 karolherbst: *supporting
19:58 imirkin: if you have the "IS" flag set, it will take the upper 16 bits and add them to the index
19:58 karolherbst: but yeah...
19:59 karolherbst: jenatali: I am wondering if we want to do a vec2 thing instead... mhhh
19:59 karolherbst: should help with more indirection stuff
19:59 jenatali: vtn really doesn't like emitting vec2s
19:59 jenatali: Or dealing with uint->ptr casts for vec2s
20:00 karolherbst: yeah.. it's annoying
20:00 karolherbst: but I guess pack/unpack is good enough
20:00 karolherbst: and in vtn I could say: either we emit constants for the index, or runtime provided
20:00 karolherbst: or so..
20:00 karolherbst: dunno yet
20:01 karolherbst: we might want to have a 16/16 split for 32 bit devices.. mhh
20:01 karolherbst: mhhh
20:01 karolherbst: oh well
20:03 curro_: karolherbst: hum, set_constant_buffer doesn't seem particularly SVM-ready either. honestly the current approach where constant buffers are just plain surfaces seems more straightforward in that case
20:07 karolherbst: curro_: well, SVM passes in the host ptr :)
20:07 karolherbst: no way any hardware would support it trhough constant buffer
20:08 karolherbst: I think nvidia even lowers all contant buffer loads to global loads with a constant instruction modifier
20:08 curro_: yep
20:08 karolherbst: but they stopped doing it
20:08 karolherbst: or it only happens at runtime or so
20:08 karolherbst: but I also don't want to hurt kernels in the general case.. it's annoying
20:10 karolherbst: but even with CL2.0 you can say which arguments won't be SVM pointers
20:10 karolherbst: mhhhh
20:10 karolherbst: and with CL3.0 all of that becomes more optional
20:10 bbrezillon: mattst88: can I have your R-b on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6290 ?
20:11 airlied:smashes gl 4.5 at marge
20:11 bbrezillon: unless you see a better solution to fix the issue
20:11 mattst88: bbrezillon: I haven't looked at the details, but feel free to put an Acked-by on it and merge it
20:12 bbrezillon: mattst88: ok, thanks
20:12 ajax: airlied: nice work with vallium btw
20:13 airlied: ajax: thx!
20:15 airlied: dcbaker: hey, I'd like to get the two fixes + llvmpipe GL4.5 enable into 20.2.0 if possible (just let me know if there are any issues with that)
20:15 airlied: I'm hoping to submit 20.2.0 for conformance
20:15 karolherbst: curro_: ehh.. I guess you could also use a host_ptr for backing the constant memory :/ mhhh but I think this is somewhat supported already
20:16 dcbaker[m]: airlied: at long as none of the other llvmpipe people object, it's fine with me
20:17 dcbaker[m]:is basically fine with driver people pushing whatever into their drivers, it's common code /me worries about
20:17 airlied: dcbaker[m]: it might need a release notes tweak is the only thing
20:17 curro_: karolherbst: IIRC the problem with SVM constant buffers was that you can indeed specify which constant arguments are SVM and which ones aren't via the CL2.0 API, but there is no way for the compiler to know that beforehand, so we likely need to treat them all as SVM anyway if the device supports it at all...
20:17 karolherbst: yeah...
20:18 karolherbst: or compile variants
20:18 curro_: 2^n kernel variants just in case? :P
20:18 karolherbst: well.. on demand
20:18 karolherbst: we do this for graphics anyway
20:18 karolherbst: more or less
20:18 karolherbst: and backends usually cache
20:18 karolherbst: but yeah...
20:18 karolherbst: it's annoying
20:18 curro_: still, the number of combinations can explode pretty quickly
20:18 karolherbst: right...
20:18 jenatali: Yeah, we have to recompile variants at enqueue time based on e.g. local parameter size, or sampler properties
20:19 dcbaker[m]: airlied: that's cool, we don't actually generate release notes till the .0 release anyway
20:19 karolherbst: but it's not our fault the CL API sucls :p
20:19 karolherbst: *sucks
20:20 karolherbst: curro_: a bit hacky, but: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/63fde92fd22a8096da6968d0e1673bcf1e6864f6
20:24 karolherbst: jekstrand: do you think we really need a mem_constant which is something between ubos and mem_global? like it is just VRAM, but constant memory and compilers can do aggressive cachine, and we can do more opts and shit?
20:24 karolherbst: would also be interesting to know about potential runtime improvements
20:24 curro_: karolherbst: seems on the right track. i guess this breaks the pipe driver API though, probably some pipe driver and compiler fixes are required
20:24 karolherbst: yeah...
20:24 karolherbst: I think there is some cap for the uniform cb0 thingy
20:25 karolherbst: or so
20:25 jenatali: karolherbst: Couldn't Clover put the UBO indices in the kernel arg buffer? Instead of forcing vtn to do it?
20:25 karolherbst: jenatali: with a pass? mhhh, not really
20:25 karolherbst: or at least not if we create the wrapper in vtn
20:26 curro_: hm? i thought clover was passing in the CB indices as arguments already
20:26 jenatali: You're already loading a uint64, and then overwriting the high bits
20:26 karolherbst: cl is stupid and it is kind of designed in a way, that only makes sense if you assume all kernels have an implicit wrapper where the driver does weird stuff
20:26 karolherbst: jenatali: but then I don't know the value at compile time
20:26 jenatali: Oh
20:26 karolherbst: curro_: at runtime though, so we have to lower to global memory _always_
20:26 jenatali: You need to know it at compile-time?
20:26 karolherbst: yes
20:27 karolherbst: indirect ubo access on nv hardware in computer shaders -> global_mem
20:27 karolherbst: uhm
20:27 jenatali: Ah, got it
20:27 karolherbst: indirect ubo index
20:27 karolherbst: yeah..
20:27 karolherbst: it's annoying
20:27 curro_: karolherbst: which is what you said nVidia was doing anyway, right? doesn't sound terrible
20:27 jenatali: Still seems doable in a pass after the fact FYI
20:27 karolherbst: curro_: not by default
20:27 karolherbst: if I just compile kernels they do constant memory
20:28 karolherbst: but I know there is a flip somewhere
20:28 karolherbst: they also officially only support 1.2
20:28 karolherbst: so...
20:28 karolherbst: jenatali: ohh, right..., nope, you are right, I was stupid :D
20:28 jenatali: Just a matter of tagging the variable, and tweaking the load_deref to overwrite the high bits
20:29 karolherbst: yeah..
20:29 karolherbst: something
20:29 jenatali: We've got tons of post-processing of input args we're doing for CLOn12 :D
20:29 karolherbst: I can imagine :D
20:30 karolherbst: maybe I should force nvidia using 10 buffers and see what they are doing then :D
20:30 karolherbst: but normally ptx doesn't deal with this kind of annoying crap and just bails
20:30 karolherbst: and their clc to ptx translator also does weird stuff
20:57 karolherbst: ehhh.. test_basic constant_source doesn't work :/
20:57 karolherbst: and kernel_memory_alignment_constant mhhh
20:57 karolherbst: why does it suck so much :(
20:58 karolherbst: jenatali: what do you have to make those compile?
20:58 jenatali: Uh
20:58 jenatali: What's wrong with them?
20:58 karolherbst: that's the "without_array->block" assert
20:59 karolherbst: so you have a constant array declarations inside the kernel
20:59 karolherbst: and no buffer bound of course
20:59 karolherbst: I think you lower to the constant table stuff, no?
20:59 jenatali: Yeah
21:00 jenatali: Where's the assert you're hitting?
21:00 karolherbst: ../src/compiler/spirv/vtn_variables.c:2122
21:00 jenatali: Hm, let me look at what's in our tree
21:01 jenatali: Oh that assert's just not there :)
21:02 karolherbst: heh?
21:02 karolherbst: it's inside vtn_create_variable
21:02 karolherbst: the mode_ubo case
21:03 jenatali: Yeah, we don't have it in our tree
21:03 karolherbst: ahh :D
21:03 jenatali: I suspect it was added after we forked, since the last time we rebased
21:03 karolherbst: I guess you removed it?
21:03 karolherbst: ohh
21:03 karolherbst: I see
21:03 jenatali: Let me check
21:03 karolherbst: well, removing it just fails later :p
21:04 jenatali: Hm, no, guess we did remove it
21:04 karolherbst: I know you were hitting this issue in the past :D and I am sure you have some patches for that
21:04 jenatali: Let me look
21:04 jenatali: Btw karolherbst, airlied: Looks like SPIR-V targets for libclc just landed upstream :)
21:05 jenatali: https://gitlab.freedesktop.org/kusma/mesa/-/commit/c6de20bdb6de55fd20919bab17c2251a191d46c8
21:05 jenatali: karolherbst: Where's the failure if you remove the assert?
21:06 karolherbst: jenatali: test_basic: ../src/compiler/nir/nir_lower_io.c:1142: nir_explicit_io_address_from_deref: Assertion `deref->mode & (nir_var_shader_in | nir_var_mem_shared | nir_var_shader_temp | nir_var_function_temp)' failed.
21:06 airlied: jenatali: yay!
21:07 jenatali: Ahhh
21:07 jenatali: karolherbst: That pass doesn't know how to generate an address for a deref_load of a ubo variable, yeah
21:07 jenatali: Let me find the lowering pass we run
21:07 karolherbst: yeah...
21:08 karolherbst: thanks :)
21:09 airlied: we shouldn't be far off being able to use llvmpipe to run the piglit CL tests now
21:09 airlied: in CI
21:09 airlied: though not sure how to get all the llvm stuff built for CI, might have to wait a bit for things to get released
21:10 jenatali: Yeah, makes sense
21:10 karolherbst: airlied: cool :)
21:10 jenatali: airlied: Would love your review on my extensions to your libclc series
21:11 jenatali: karolherbst: After we attempt to promote all inline constants to shader_temp (which we map to DXIL's built-in constant buffer functionality), we convert remaining ubo -> ssbo, and then run this pass: https://gitlab.freedesktop.org/kusma/mesa/-/blob/msclc-d3d12/src/microsoft/compiler/dxil_nir.c#L864
21:11 jenatali: Before lower_explicit_io
21:13 karolherbst: jenatali: mhhhh
21:13 jenatali: karolherbst: You should be able to adapt that into something similar for ubos
21:13 karolherbst: wondering if making lower_big_constant or whatever the pass was called might be more straighforward
21:14 karolherbst: having nir_shader.constant_data as an implicit __constant* kernel arg somehow does sound very nice
21:15 karolherbst: or let the driver handle that
21:15 karolherbst: dunno
21:17 airlied: jenatali: will try look a the mesa MR today
21:17 jenatali: airlied: Thanks!
21:24 karolherbst: where is the official libclc git tree? :D
21:26 airlied: karolherbst: llvm
21:27 kisak: https://github.com/llvm/llvm-project/tree/master/libclc
21:42 karolherbst: ahhh
21:43 karolherbst: how much boilerplate code that libclc is..
21:43 jenatali: Yeah... but they've got good implementations of a lot of functions I wouldn't want to have to reimplement :)
21:44 karolherbst: pmoreau: guess we need to support spir-v 1.1 now :p
21:44 karolherbst: yeah
21:45 karolherbst: so by llvm-12 we can have proper upstream opencl support in mesa? :D yay
21:45 jenatali: Looking that way :)
21:46 karolherbst: yall forcing me to compile my own version of llvm now :( *sadface*
21:46 karolherbst: uhm.. mhh wait
21:46 karolherbst: libclc shouldn't have any runtime deps
21:47 karolherbst: so it's just for building it.... mhhh
21:47 jenatali: Yep, it outputs a .spv
21:47 karolherbst: mhhh...
21:47 karolherbst: shouldn't be too bad then
21:48 jenatali: We've still got 2 outstanding PRs against LLVM FYI, one to add a few more functions, and the other to allow after-build toggling of hardware vs software FMA usage
21:48 jenatali: But they're tiny and hopefully shouldn't be too contentious?
21:49 karolherbst: mhhh fma...
21:49 jenatali: Yeah, DXIL doesn't have an fma
21:49 karolherbst: ....
21:49 airlied: karolherbst: yeah it should be possible to build libclc on it's own
21:49 karolherbst: oh god why
21:49 jenatali: So we need to force libclc to use paths that don't require fma's increased precision
21:49 airlied: just needs a clang + spirv-llvm-translaator
21:49 karolherbst: nir doens't have a split as well
21:49 karolherbst: but really should have
21:49 karolherbst: like, really
21:49 jenatali: A split?
21:50 karolherbst: fma and fmad
21:50 jenatali: Ah
21:50 karolherbst: or whatever
21:50 karolherbst: uhm
21:50 karolherbst: ffma and fmad
21:50 jenatali: I thought it did have both...
21:50 karolherbst: uhm...
21:50 karolherbst: wait
21:50 karolherbst: I think you are right
21:50 jenatali: DXIL's just got fmad, which is allowed to be implemented as ffma or fmad
21:50 karolherbst: it just optimizes whatever it pleases or so?
21:50 karolherbst: mhh
21:50 karolherbst: let's see
21:51 karolherbst: nope
21:51 karolherbst: only ffma
21:51 jenatali: Huh
21:51 karolherbst: in nouveau I did the dirty trick of telling nir to never ever optimize fmul+fadd to ffma
21:52 karolherbst: but keep all ffmas and not lower them
21:52 karolherbst: that works reasonably good
21:52 karolherbst: but ultimately I'd like to have both variants
21:53 anholt: any c spec wizards able to explain the ubsan error in https://gitlab.freedesktop.org/anholt/mesa/-/jobs/4126764? line is "d |= *(ptr++) << 24u;", d is u32, ptr is u8, 24u is u, where is "int" coming from?
21:53 karolherbst: airlied: I think the translator does the right thing here, no?
21:53 karolherbst: I think there was something stupidly broken
21:53 karolherbst: or the translator trying to be smart
21:55 airlied: karolherbst: not sure what translator does there actually
21:55 karolherbst: isUnfusedMulAdd ... magic
21:55 airlied: anholt: *(ptr++) getting promoted to an int?
21:55 karolherbst: but yeah...
21:55 karolherbst: fmad vs ffma handling always sucks :/
21:56 karolherbst: sadly
21:56 karolherbst: just have two ops
21:56 karolherbst: :D
21:56 anholt: airlied: presumably, I'm just trying to figure out where that promotion comes from
21:56 airlied: I think the warning is it's not defined
21:57 airlied: the compiler might use an unsigned int
21:57 airlied: but it can use an int
21:57 anholt: airlied: you're thinking that the type of the expression is int because the dst is 32 bits while the src was only 8 bits shifted left by <something>?
21:57 jenatali: karolherbst: What it comes down to is that CL requires a high-precision ffma, and libclc has an implementation of ffma (for float32), but other functions like sin/cos have shorter implementations which leverage ffma, or slightly longer ones which don't - much more efficient than assuming ffma exists and having to use the crazy software version when it doesn't
21:57 imirkin: anholt: perhaps u8 can get promited to i32?
21:58 imirkin: anholt: what if you do d |= (u32)*(ptr++) << 24 ?
21:58 anholt: that was my next thought. I do seem to recall something stupid about c's type promotion rules when going from when smaller to larger types.
21:59 karolherbst: jenatali: not saying one should use the crazy software version.. just that ... mhhh.. annoying
21:59 karolherbst: the annoying thing is just, that hw usually doesn't support both natively
21:59 imirkin: anholt: in my mind, the first rule of C type promotion is that there are no rules.
21:59 karolherbst: but at least on nv, we have a native ffma and if we need to emit a precise fmad we emit fmul + fadd
21:59 anholt: principle of most surprise.
21:59 airlied: anholt: u9 to i932 promotion
21:59 airlied: doh
21:59 karolherbst: but yeah..
21:59 airlied: u8 to i32
21:59 karolherbst: it's annoying
22:00 anholt: airlied: my whole question is "Where does that 'i' come from when every other things is 'u')
22:00 imirkin: anholt: yeah, which means it changes over time, as you start expecting things :)
22:00 airlied: anholt: that is where the undefined comes in
22:00 airlied: the spec doesn't say it won't be an i
22:01 jenatali: airlied: My read of the spec I thought would require it to be unsigned
22:01 jenatali: Oh nvm, this section of the spec doesn't apply to shifts
22:01 airlied: "Arithmetic operators do not accept smaller-than-int integers. Unsigned char
22:02 airlied: gets promoted to int and there is no overflow in arithmetic."
22:02 airlied: is from an issue I found
22:02 jenatali: Integer promotion is the implicit conversion of a value of any integer type with rank less or equal to rank of int or of a bit field of type _Bool, int, signed int, unsigned int, to the value of type int or unsigned int
22:02 jenatali: If int can represent the entire range of values of the original type (or the range of values of the original bit field), the value is converted to type int. Otherwise the value is converted to unsigned int.
22:02 airlied: yup so int happens there
22:02 airlied: since it can repesent all of unsigned char
22:02 airlied: then you shift it and boom
22:02 jenatali: Right
22:03 anholt: that's different awful than I was even imagining! thanks everyone, that u32 cast should hopefully appease ubsan, then.
22:03 jenatali: Why shifts use "integer promotion" instead of "usual arithmetic conversion" (at least using the titles from https://en.cppreference.com/w/c/language/conversion) I don't understand at all
22:55 jenatali: karolherbst: Think we should just Marge !6355?
23:22 jekstrand: karolherbst: I think the biggest advantage to a constant storage class is that you know a priori that loads can be CSEd
23:23 jekstrand: And you don't have to worry about writes or barriers from other storage classes.
23:26 airlied: make divergance analysis a lot easier
23:28 jekstrand: Divergence analysis doesn't really change. A load from global, ssbo, or shared with uniform inputs will have a uniform output too.
23:29 airlied: I suppose you get to assume nobody is writing to global or ssbos :-P
23:29 airlied: jenatali: I think the clover bits probably need some review on that libclc MR maybe from karolherbst
23:30 airlied: jenatali: you might also want to slap some r-bs on my patches
23:30 jenatali: Yep, can do, and makes sense
23:31 airlied: otherwise I'm fine with all of the other stuff
23:32 jenatali: Awesome, thanks :)
23:32 airlied: I'm pondering if we should just ship a backup clc lib with mesa for CI purposes
23:32 airlied: just don't have much insight into how much churn is in there at present
23:33 jekstrand: So here's a question for the CL people. Let's say I want to implement some Intel extensions that involve magic intrinsic functions. How would you say we should go about it?
23:33 jekstrand: One obvious option is to have code in spirv_to_nir which detects the name and types (or maybe mangled name?) and implements OpCall with something else.
23:33 jenatali: airlied: Not a bad idea, it's only a few MB. Though I haven't really seen much churn in the last ~year or so that I've been looking at it
23:34 jekstrand: Is that going to cause clover heartburn?
23:34 jenatali: jekstrand: Does the Intel extension result in SPIR-V opcodes, or just calls to unimplemented functions?
23:34 jekstrand: jenatali: Calls to unimplemented functions for most of them. :-/
23:35 jekstrand: jenatali: I'm trying to push things towards opcodes but there's a lot of legacy from the pre-SPIR-V days
23:35 jenatali: jekstrand: You might want to take a look at the libclc series - there's a lowering pass in there which essentially turns "call to unimplemented function" into "inlined function" by using NIR provided by an external shader
23:36 jekstrand: jenatali: Does it do so as part of SPIR-V parsing or later?
23:36 jenatali: The "call to unimplemented function" currently all comes from an OpCode => mangled name conversion
23:36 jenatali: jekstrand: The call is generated as part of SPIR-V parsing, the inlining from external shader comes after
23:36 jekstrand: jenatali: Ok, so for what I'm getting with the Intel extensions, maybe I only want half of that
23:36 jekstrand: I already get an unimplemented function call
23:37 jekstrand: Sachiel: ^^
23:37 jenatali: Yeah, you just need a way to inline the resulting code from an Intel extensions shader
23:38 jekstrand: Ok, that shouldn't be terribly hard
23:38 jekstrand: I was just concerned that clover would barf if it sees an unimplemented function. But maybe if I do it early enough, clover won't notice.
23:39 jenatali: Right
23:39 jenatali: Just make sure you do it before the regular inlining pass
23:39 jenatali: Which IIRC is pretty early
23:40 jenatali: jekstrand: I'm liking you looking at CL, it makes it a lot easier to upstream our patches when you end up wanting bits and pieces of them :P
23:40 jekstrand: jenatali: hehe :D
23:42 jenatali: jekstrand: This is the commit you're looking for: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035/diffs?commit_id=3362a4a9c0efa355e9ea15947b56c82b411778ea
23:42 jenatali: Though there's a later commit in the series which does pass the same shader to vtn to enable it to generate the calls
23:45 airlied: jenatali: just string him along by planting fixes he needs into broader MRs :-P
23:47 airlied: kinda feels like CL 1.2 isn't horribly out of reach, and if we get generic pointers as well,
23:47 jenatali: airlied: CLOn12 is pretty much ready to certify for CL1.2, yeah
23:47 jenatali: Once everything's upstreamed, Clover won't be far behind
23:50 jenatali: I'm honestly surprised but super happy at how quickly we were able to get here. Mesa/nir has been awesome