01:14kode54: btw
01:14kode54: that issue I reported for flickering water layer in Borderlands 3?
01:14kode54: I replicated it in Borderlands 2
01:14kode54: I'll post as such in the issue
01:14kode54: easier to get to it there, you don't have to watch any benchmark
01:15kode54: you can just left-click and pan the view around to look at the lake in the valley below
07:32pq: emersion, I read your reply on the KMS color pipeline thread, and I agree with everything your wrote.
07:57emersion: pq, sweet!
13:05FireBurn: Would someone mind reverting 58e67bb3c131da5ee14e4842b08e53f4888dce0a I'm hoping to avoid it getting sent to airlied and onto linus
13:13zamundaaa[m]: Is there a way to import an EGL fence?
13:15zamundaaa[m]: I'm trying to blit a texture from one GPU to another, and with NVIdia that causes artifacts because of the lack of synchronization. Ideally I'd create an EGL fence on the source GPU, and have the other GPU wait before doing the blit with eglWaitSync, but I haven't found a way to actually get a fence for this on the destination GPU
13:20emersion: look at weston maybe
13:21pq: EGL_ANDROID_native_fence_sync might be the key
13:24zamundaaa[m]: ah, so the fd is passed in as an attribute. Thanks!
13:25ickle: win 16
13:47emersion: sad that there's no drm_syncobj love
13:59alyssa: gfxstrand: ok, I have something typed out to kill off abs/neg/fsat modifiers without requiring any nontrivial changes to backends
13:59alyssa: (in particular, it does not require the backend to have working copyprop or dead code elimination)
13:59alyssa: I hate it, but more than that I hate that we have backends that don't have DCE
13:59alyssa: and, it means we actually have a chance of killing them off
14:00gfxstrand: :sob
14:00alyssa: so, probably worth the stupid
14:00alyssa: the usual strategy--
14:00alyssa: ahead-of-time trivialize pass that inserts copies to ensure fabs/fneg/fsat are folded 100% of the time,
14:00alyssa: helpers to chase through fabs/fneg/fsat at backend isel time,
14:01alyssa: and a gaurantee to backends that fabs/fneg/fsat will be chased 100% of the time so they just need to Not emit any code for them
14:02gfxstrand: Running HSW now
14:03gfxstrand: Let's see how bad the damage is.
14:10alyssa: from the nir_register changes?
14:10alyssa: (Intel doesn't use lower_to_source_mods anymore so thankfully it's spared of this particular abomination)
14:10alyssa: only uses are ntt, etnaviv, a2xx, lima, and r600/sfn
14:11alyssa: I am not volunteering to rewrite people's compilers
14:11alyssa: so.. this the consolation prize
14:12gfxstrand: I'm more worried about vec-to-reg
14:13alyssa: nod
14:13alyssa: midgard seems happy with it
14:13gfxstrand: Okay, ptn bug fixed.
14:27gfxstrand: alyssa: https://paste.centos.org/view/89c5ba29
14:27gfxstrand: I've not done any analysis on why
14:28gfxstrand: Also, that's vec4-only. I filtered out FS/CS.
14:28alyssa: gfxstrand: :| disappointing
14:28alyssa: I mean. I would still rip off the bandaid personally, but
14:29alyssa: midgard was total instructions in shared programs: 1518573 -> 1514188 (-0.29%)
14:29gfxstrand: I'll look into it this afternoon
14:29alyssa: I guess what we're seeing here is that Intel has significantly better vec4 copyprop than Midgard and we're getting a regression to the mean
14:31alyssa: gfxstrand: what's your personal threshold for acceptable shaderdb hit?
14:47hramrach: hello, what card are supported?
14:49hramrach: RADV page https://docs.mesa3d.org/drivers/radv.html is a stub that point to https://www.x.org/wiki/RadeonFeature/ which has a nice feature table which ends with Arctic Islands. So I suppose for Navi I should turn to Windows?
14:50pendingchaos: https://www.x.org/wiki/RadeonFeature isn't useful for determining hardware support
14:50pendingchaos: that feature table isn't about radv
14:51pendingchaos: apparently it documents radeonsi, but clearly outdated
14:51hramrach: so what is useful documentation for radv?
14:52pendingchaos: https://docs.mesa3d.org/envvars.html#radv-driver-environment-variables
14:52pendingchaos: besides those, it's just a Vulkan driver
14:52alyssa: gfxstrand: also, pushed nir/legacy-mods, it has your pushed fix squashed in though not the unpushed ptn fix
14:53hramrach: but that documents driver some diver settings, not what hardware it supports
14:54pendingchaos: RADV should support all AMD GPUs supporting Vulkan
14:55hramrach: the moment they are released?
14:56pendingchaos: there might be some delay (both because of release schedules and development effort) depending on how different the new GPU is from predecessors
14:57pendingchaos: gfx1100 and gfx1101 for example, should be basically the same
14:57pendingchaos: gfx1030 and gfx1100 had significant differences
14:57hramrach: so how do I tell when a GPU has aged enough to be supported?
14:57llyyr: rdna3 was supported pre-release already
14:58llyyr: generally stuff should work on release but they might be buggy and that gets sorted out over time
14:59hramrach: That would be nice improvement since the times that table is from
14:59pendingchaos: I don't think there's any official list of RADV hardware support, so you can't easily tell
14:59llyyr: radv supports all GCN/RDNA cards
14:59pendingchaos: I think usually phoronix and such will release an article when a generation of gpus is supported
15:00llyyr: so from hd 7000 series up to rx 7xxx
15:01hramrach: yes, phoronix would probably have that
15:04pendingchaos: ah, release notes also have new hardware support
15:04pendingchaos: like https://docs.mesa3d.org/relnotes/22.3.0.html has "Mali T620 on panfrost" and "initial GFX11/RDNA3 support on RADV"
15:05hramrach: but they are split by version, not by hardware
15:22mupuf: hramrach: assume everything to work, unless the hardware is really exotic
15:22mupuf: If it doesn't: file a bug
15:23mupuf: Generally, the faster GPUs are better supported
15:23mupuf: Unless they are really expensive
15:23mupuf: (CDNA cards for example)
15:26pendingchaos: I don't think RADV works at all on CDNA
15:26pendingchaos: at least, I don't think the newer ones can support Vulkan
15:29mupuf: Right :D
16:10hramrach: so let's say that consumer cards should work, server cards not
16:10hramrach: thanks
16:27alyssa: Why is the LLVM IR generated by gallivm so chunky T_T
16:27alyssa: I guess that includes a big chunk of rasterizer in there too?
16:45DavidHeidelberg[m]: anholt: btw. are the `swrast` runners definitely lost or at some point there is chance in future?
16:45anholt: they are gone. not going to be standing anything up at least until we have kata.
16:49agd5f: pendingchaos, you could support vulkan on CDNA cards, but they would only have transfer/compute/media queues, no GFX.
16:50pendingchaos: didn't the recent CDNA remove texture filtering? I think Vulkan requires that
16:55pendingchaos: I guess it probably could be emulated with a lot of effort
16:55pendingchaos: but maybe there's more mandatory Vulkan features that are missing
16:58agd5f: pendingchaos, should still be supported, at least according to the MI200 ISA document
16:58agd5f: supports everything needed for OCL 2.x
17:00pendingchaos: seems to have image_sample
17:00pendingchaos: no mipmaps though?
17:03pendingchaos: or gather
17:03alyssa: agd5f: then why did Marek need to do a software texturing pipeline for CDNA?
17:10jenatali: gfxstrand, alyssa : Ping on !23173, there's still a few nir patches in there in need of ack/review
17:24alyssa: jenatali: trade you? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests?scope=all&state=opened&author_username=alyssa&label_name[]=NIR&milestone_title=Needs%20review
17:24jenatali: :P
17:24alyssa: the scoped barriers one is only 3 patches left
17:24alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191#note_1946902
17:24jenatali: I suppose that's only fair, I'll take a look
17:33agd5f: alyssa, maybe it is then? Not sure.
18:44alyssa: jenatali: thoughts on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23351/diffs?commit_id=f6e9ab6b547af1b0eb241e5afa133ddc2b04e4c8 ?
18:45alyssa: the whole SM5 shift mess is, as usual, a mess
18:46alyssa: The obviously alternative is changing ubfe_imm to only produce ubfe if lower_bitfield_extract is set, otherwise, ubitfield_extract is produced
18:46alyssa: s/obviously/obvious/
18:46alyssa: It's unclear to me if that's better or worse
18:47alyssa: For the _imm case, ubfe and ubitfield_extract are interchangeable (since we can just mask the immediate at build time)
18:47alyssa: (or better yet, assert the immediate < 32)
18:47alyssa: Hmm.. maybe I should do that actually
18:47alyssa: pendingchaos: thoughts on ^^?
18:53alyssa: I once again wonder if the default really should be khronos behaviour and _sm5 suffixed ops do the masked thing... meh
18:53alyssa: would like to kick that can down the road again though.. I just want ubfe_imm or equivalent for agx
19:15jenatali: 🤷♂️ I looked at it but I don't really have any strong opinions on the matter
19:19alyssa: valid
19:19alyssa: maybe pendingchaos does
19:24pendingchaos: I think building ubfe/ubitfield_extract depending on lower_bitfield_extract and using a unified helper makes sense, but having two helpers doesn't sound like a real problem
19:25alyssa: sure
19:25alyssa: let me know the preferred bikeshed colour and I'll paint it
19:37alyssa: airlied: when you have a few minutes could I pick your brain about AoS/SoA gallivm?
19:37alyssa: usually don't like to "ask to ask" but I don't yet have a coherent question formulated
19:38airlied: alyssa: my brain has defeated you by purge AoS/SoA knowledge right down to knowing which one is which
19:38airlied: but yes ask and pick away
19:40airlied: alyssa: also sampling AoS/SoA is slightly different to the AoS/SoA execution model
19:40airlied: by default we use soa execution and mostly soa sampling but sometimes sampling goes to aos mode
19:41airlied: for one narrow use case we use aos execution
19:43alyssa: oh boy
19:43alyssa: airlied: The basic question I have is that load_reg/store_reg take arrays of LLVMValueRefs
19:44alyssa: instead of just a single LLVMValueRef for the whole vector
19:44alyssa: it seems in the AoS path only the [0] component is used
19:44alyssa: but in the SoA path every component is used separately
19:45alyssa: I guess "AoS" is like vec4 gpus and "SoA" is like scalar GPUs?
19:45alyssa: in that case, why would gallivm even see vectorized NIR in the first place?
19:45alyssa: why not scalarize completely in NIR, so we only need the single LLVMValueRef (corresponding to either the one component or the whole vector)?
19:46airlied: probably because the core code was originally TGSI designed and TGSI is vec4
19:46airlied: so it just kept doing that when I ported it to NIR, and handled vecotrs
19:46airlied: but it doesn't really correspond to GPUs that well
19:47alyssa: OK
19:47airlied: SoA mode is it stores 4/8-wide scalars
19:48airlied: so a vector in SoA mode is just a set of vec-len scalars each of which is 4/8 channels wide
19:48airlied: depending on avx etc
19:48alyssa: yes, that's how scalar GPUs work
19:48airlied: oh my scalar gpus have uniform regs which llvmpipe doesn't :-P
19:48alyssa: mine don't
19:49airlied: AoS is a special case for storing 16-wide chars
19:49airlied: so that you can process 4 8-bit RGBA pixels in one go
19:49airlied: it's very limited in scope in what you can do
19:49airlied: it's just to provide a fast path for blits and copies
19:49alyssa: Right, ok
19:50airlied: so yes we could probably scalarize completely in NIR for the aos case, but the TGSI code still exists
19:50alyssa: OK
19:50alyssa: mostly i'm trying to understand why assign_dest (for example) takes an array of valuerefs instead of just one
19:50alyssa: but you're saying that's just TGSI legacy?
19:51airlied: what one value ref would it take?
19:51airlied: if dest has 4 components
19:51airlied: you can't do vectors of arrays of values in llvm IR
19:51alyssa: why would you ever have that, though?
19:51airlied: because we haven't scalarised 4 component stores
19:51alyssa: ooh
19:51airlied: though maybe in practice we have
19:52alyssa: like, store_ssbo?
19:52airlied: I think the main uses caess are the vec4 type constructors
19:52airlied: nir_vec4 etc
19:53alyssa: right..
19:53airlied: where you have one ssa value that is a vector of scalars but the scalars are 8-wide arrays
19:53alyssa: maybe I'm objecting to the "Loop for R,G,B,A channels" in the SoA case in visit_alu
19:53alyssa: not really interested in reworking this. just trying to figure out what to do for my NIR rework
19:53alyssa: and today is llvmpipe
19:53alyssa: day
19:55airlied: yeah so we do all the operations once on each component of the vector, then collect the results, then store them back as an array
19:56airlied: I just didn't see the value for register stores of sticking them into an LLVM array
19:56airlied: just to pull them back out again
19:56airlied: since register stores actually go to memory, as opposed to just hide inside the ssa value hash table
19:56alyssa: why are there multiple components on the vector?
19:56alyssa: aren't we calling lower_alu_to_scalar in the SoA case?
19:57alyssa: I guess we aren't
19:57alyssa: we should be, I guess
19:57airlied: lavapipe does, not sure llvmpipe does
19:57airlied: probably a cleanup possible there
19:57alyssa: doesn't look like it does
19:57alyssa: yeah.. not today's cleanup though
19:57alyssa: currently defeaturing nir_register from llvmpipe
19:58airlied: there's probably quite a lot of llvm side stuff that could be moved to NIR side
19:58alyssa: Yeah
19:58airlied: it's mostly a legacy of TGSI and whatever state nir was in when I wrote it
19:58alyssa: piles of the graphics pipeline emulation code could be common NIR passes too I think
19:58alyssa: llvmpipe using nir_lower_blend anyone? ;-D
19:58airlied: oh that stuff is so finely hand written
19:59alyssa: D=
20:01airlied: I fear to tread in the blending pipeline, so many hand coded swizzle calls that I don't really understand
20:03gfxstrand: That sounds like a good argument for NIRifying
20:03alyssa: :crab_fire:
20:07airlied: it would be, but I doubt it would get as fast
20:07airlied: since it's mostly hand writing LLVM IR to optimise thing
20:07airlied: not sure translating NIR would achieve the same level, since NIR doesn't have a view into the LLVM 4-8 wide fun
20:10airlied: alyssa: I think the other reason we don't scalarise in NIR is for the soa/aos decision point there might not be a simple point to do it
20:10alyssa: hum
20:12mareko: alyssa: the CDNA thing is going to be answered in due time
20:13airlied: mareko: do you know if anyone runs a cdna card in a workstation? :-)
20:14mareko: airlied: I don't know much about where CDNA is used
20:14airlied: seems to be server gpus, but who wants a rack in their home :-P
20:15alyssa: mareko: mysterious ^_^
20:16mareko: other than what's publicly knows, such as El Capitan
20:16mareko: *known
20:18HdkR: airlied: I have a rack in my home :P
20:19HdkR: Might get a second one if I feel like experimenting a bit more
20:21airlied: HdkR: do you have soundproofing :-)
20:21airlied: the only locations I could put a rack would be near me or outside in the sun
20:21mareko: and cooling
20:22HdkR: Nah, I'm only rocking 4U chassis with slow spinning fans in it. Loudest thing is one of the 10gbit network switches
20:22alyssa: airlied: Do you hate this https://rosenzweig.io/0001-gallivm-Switch-to-reg-intrinsics.patch
20:23airlied:has to wear noise cancelling headphones to compile llvm or use the preprod navi33 card I have
20:23alyssa: Lool
20:23HdkR: Sounds about right for most rack-mount things :D
20:23karolherbst: airlied: that reminds me...
20:24karolherbst: btw, how did you end up connecting that one to your system?
20:25airlied: alyssa: seems to be about what I'd expect
20:26airlied: karolherbst: I ended up turning my machine on it's side, putting it on a cardboard box, and when I put that card in I stick another piece of cardboard box between it and the PSU to ensure it is supported
20:26alyssa: airlied: =D
20:26karolherbst: oof
20:26airlied: I really should get an PCIE extender so I can put it flat on something
20:27karolherbst: yeah.. I planend to use a PCIe extender as well...
20:27karolherbst: forgot about it
20:28alyssa: airlied: welp, 4 backends down, 11 to go.. ugh... https://gitlab.freedesktop.org/mesa/mesa/-/issues/9051
20:29airlied: alyssa: would be interested to see what CI says
20:29alyssa: me too
20:30jenatali: alyssa: Are there that many backends that consume registers?
20:30alyssa: jenatali: Sadly, yes
20:30DemiMarie: anholt: I take it you mean “kata containers”? Is that to prevent any more sandbox escapes?
20:30jenatali: Oof
20:31alyssa: though DXIL isn't one of them so you're off the hook I Guess
20:32jenatali: Yep
20:32alyssa: I need to update the MR description
20:33alyssa: 'cause killing off abs/neg/sat modifiers is also in scope for this now :~)
20:33DemiMarie: Why is CDNA still considered a GPU? It can’t even do graphics, so I imagine it would belong under drivers/accel instead.
20:33alyssa: that's a lot less onerous though, since no mature backends use them
20:33alyssa: ntt, etnaviv, a2xx, lima, and r600
20:33alyssa: and I did ntt
20:34alyssa: IDK who's going to do the other 4
20:34alyssa: I typed up a lot of helpers to make it painless as possible to migrate, but even so
20:35alyssa: I'm not volunteering myself to work on those 4
20:44alyssa: gfxstrand: ooh I hit the prog-to-nir thing in CI, fun
20:44alyssa: pushed to the MR the source/dest modifier stuff and the llvmpipe conversion at any rate
20:46alyssa: zink, you're up
21:34airlied: do I report spam or get unlimited vbucks, big choices
21:41alyssa: what happened with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191 ?
21:41alyssa: the CI pipeline I mean
21:41alyssa: I am very confused
21:41jenatali: There was a failed job
21:45alyssa: and that failed the whole pipeline?
21:45alyssa: neat. that's new
21:45jenatali: Hm? Is it new?
21:46alyssa: maybe?
21:47alyssa: well, in that case I need help since IIRC iris doesn't build on arm
21:48alyssa:tries anyway in case that was fixed
21:50alyssa: oh, it does, cool
21:50alyssa: where's my drm-shim though
21:51alyssa: iris doesn't have drm-shim? :(
21:51alyssa: intel_stub_gpu. right
21:53alyssa: OK, reproduced
21:57alyssa: Ohhhh
21:57alyssa: Lol
21:57alyssa: OK
21:57alyssa: I see what happened
21:57alyssa: whoopsies
21:57alyssa: today's edition of "stupid spot the bug"
21:58alyssa: and fixed
21:58alyssa: well test still crashes for me because of arb_fragment_shader_interlock-image-load-store: ../src/intel/isl/isl_tiled_memcpy.c:609: choose_copy_function: Assertion `!"" "ISL_MEMCOPY_STREAMING_LOAD requires sse4.1"' failed.
21:59HdkR: A bit difficult to get SSE4.1 on your Macbook
21:59alyssa: Little bit yeah
21:59gfxstrand: hehe... Yeah....
22:00gfxstrand: I thought we had a non-SSE path
22:00alyssa: gfxstrand: you do, but iris was specifically asking for streaming
22:00gfxstrand: Ah, yes...
22:00gfxstrand: Because it can
22:00gfxstrand: Because it only runs BDW+ which is always paired with a GPU that supports SSE4.1
22:00gfxstrand: Unless that GPU is an Arc in which case it could be plugged into raspberry pi for all you know.
22:01alyssa: Yep
22:01alyssa: Well, not a raspberry pi I don't think
22:01alyssa: low to mid-tier arm doesn't work with dGPUs usually
22:01HdkR: Probably more a SolidRun Honeycomb or Ampere eMAG
22:01alyssa: yeah
22:01alyssa: server grade arm64 + dGPU
22:09DemiMarie: Are there any mid-grade Arm64 chips?
22:09jenatali: Oof, that's a fun bug. Glad there was a test that caught it, though I'm surprised there was only one failure
22:09DemiMarie: mid-grade = desktop PC class
22:09gfxstrand: Apple
22:10gfxstrand: Otherwise, not that I'm aware of.
22:11DemiMarie: Is that likely to ever change?
22:11jenatali: Some of QC's higher end chips are approaching that IMO
22:16DemiMarie: gfxstrand: I’m a little bit salty about Apple having so many non-standard SMMUs. Means that Xen support for Apple Silicon is unlikely to ever happen.
22:25alyssa: gfxstrand: Hmm?
22:25alyssa: oh I see
22:28alyssa: ok, zink is converted too
22:28alyssa: I think that's enough backends for proving the design is sensible
22:28alyssa: i'm off for the night then
22:29alyssa: pretty steady progress though
22:30alyssa: getting close to taking the Draft status off, so that's exciting
22:30alyssa: for me
22:30alyssa: not exciting if you were ignoring it and will soon have to convert your backends :~P
22:58Lynne: who works on vulkaninfo? khronos?
23:12gfxstrand: alyssa: Okay, I think I found at least some of the HSW regressions
23:12gfxstrand: lower_vec_to_movs is being a tiny bit more clever about placement of register stores.
23:12gfxstrand: But in a pretty niche edge-case
23:14bnieuwenhuizen_: Lynne: lunarg
23:14Lynne: do they have an issue tracker?
23:14gfxstrand: Lynne: https://github.com/KhronosGroup/Vulkan-Tools
23:16Lynne: thanks, I thought about writing an equivalent to vainfo/vdpauinfo/nv-video-info, but thought it would be better off being a part of vulkaninfo
23:18gfxstrand: alyssa: Basically, you're missing try_coalesce
23:19gfxstrand: Or rather your coalesce_swizzle thing isn't quite as good for some reason.
23:22alyssa: gfxstrand: I'd believe it
23:35gfxstrand: alyssa: So the big difference as far as I can tell is that try_coalesce in lower_vec_to_movs puts the register write directly in the ALU op that generates the swizzle source. In a store_reg world, that would mean placing a store_reg immediately after.
23:35gfxstrand: alyssa: Whereas in lower_vec_to_regs, you insert the store_reg at the vec location and then eliminate the swizzling mov, leaving the store_reg as-is.
23:35gfxstrand: So the store_reg ends up living at the vec location.
23:35alyssa: => extra moves because that store isn't trivial
23:35alyssa: ?
23:36alyssa: s/isn't/may not be/
23:36gfxstrand: I'm not following
23:36alyssa: I may not be either
23:37alyssa: The reason the placement matters is presumably because putting the store_reg too late will cause nir_trivialize_registers to insert a move that won't be coalesced?
23:37gfxstrand: No
23:37gfxstrand: It's because, thanks to SSA, the coalescing that happens in try_coalesce works across blocks.
23:38gfxstrand: It doesn't matter if the fdp4 or whatever it is happens to be 17 blocks away, if the vec is the only user, we can re-swizzle it and write the register as part of the fdp4.
23:38alyssa: haswell supports control flow????????
23:38gfxstrand: Yes, sadly.
23:38gfxstrand: :P
23:39alyssa: we dont talk about broadwell, no no no
23:39gfxstrand: By contrast, when you emit the store_reg at the location of the vec and then try to coalesce later, the problem is much harder because you're moving a store_reg with insufficient information.
23:39gfxstrand: Well, you have enough information
23:39gfxstrand: It's possible
23:39gfxstrand: Each component is written exactly once
23:40gfxstrand: But it's a lot harder than when we're doing it in try_coalesce and the value we're dealing with is SSA.
23:40alyssa:is trying to page in enough details of the passes for this to make sense
23:44alyssa: gfxstrand: I'm still not following why it matters where the store_reg instruction is placed
23:45alyssa: except I guess because trivialize_registers inserting extra moves because it doesn't see across bblock boundaries
23:46gfxstrand: It matters because back-end vec4 copy-prop and register coalesce suck
23:47alyssa: Oh, well, yes
23:47idr: Understatement of the year...
23:48alyssa: gfxstrand: I can try to reintroduce try_coalesce instead of the 2 pass thing
23:48alyssa: tomorrow, I mean. it's past working hours now I just saw an interesting problem
23:48gfxstrand: Yeah
23:48gfxstrand: That's fine
23:48alyssa: would ppreciate if you can send me a small affected shader that I can play with
23:49alyssa: but if not I can probably construct smoething
23:49alyssa: I don't really remember why I did the 2 pass thing
23:51gfxstrand: alyssa: It's sitting in your e-mail
23:51alyssa: thanks!
23:51alyssa: it appears I may have texted you the reason weeks ago but had disappearing messages on
23:51alyssa: couldn't have been that important (-: