02:01 jekstrand: is "Integrated AMD Radeon™ Vega 8" any good?
02:04 imirkin: define good
02:04 jekstrand: imirkin: I work for Intel
02:05 imirkin: so the bar's pretty low, huh? :p
02:05 jekstrand: Is it going to be better or worse than a SKL GT2?
02:05 HdkR: Better
02:05 HdkR: Think of it as something slightly higher performancing an a 64EU part :P
02:06 HdkR: higher performing than a*
02:07 agd5f: jekstrand, the 8 means 8 compute units on the GPU
02:07 anarsoul|2: IIRC windows benchmarks show that AMD performs better
02:07 anarsoul|2: I'm not sure how's linux experience though...
02:08 jekstrand: Sadly, it'll probably spend most of it's time running Windows. :(
06:39 c4droid: Hi, I'm working on building my BLFS systems X environment, when I doing install mesa, I want disabled llvm support, but when I enable the meson options -Dllvm=false, meson report error: The following drivers require LLVM: Radv, RadeonSI, SWR. One of these is enabled, but LLVM is disabled, should I turn off meson option -Ddri-drivers, -Dgallium-drivers -Dgallium-nine -Dglx -Dosmesa?
06:42 HdkR: c4droid: Yes, remove the vulkan drivers with -Dvulkan-drivers= and remove swr from the list you pass with -Dgallium-drivers=
06:43 HdkR: radeonsi also lives under the gallium-drivers option
06:57 c4droid: HdkR: Just keep these options, not give then value?
07:09 HdkR: c4droid: You'd want the default list and then remove those three drivers
07:09 HdkR: You'll have to load up meson.build to find the default list of drivers
08:34 MrCooper: HdkR: or "meson configure <build directory> | grep drivers", or bin/meson-options.py :)
08:35 MrCooper: ajax: careful, clover depends on libclc, which doesn't support e.g. Navi yet; looks like Vega should indeed work though
09:33 hakzsam: MrCooper: why do we need to install libtinfo-dev? that seems to be the root cause of the s390 failure if the i386 package is installed
09:34 MrCooper: hakzsam: llvm-*-dev depends on it
09:35 hakzsam: ok
09:36 MrCooper: it's also a mostly empty package, how did you determine it to be the problem?
09:36 hakzsam: by trial and error
09:39 MrCooper: one of its dependencies seems more likely
09:40 hakzsam: I think so
09:41 MrCooper: maybe it's just some kind of coincidence anyway, similar to any 3 or more libraries under /usr/local
09:42 hakzsam: yeah, it's a weird issue anyway and not easy to track down
11:14 hakzsam: MrCooper: one workaround is to download but not install libtinfo-dev
11:15 MrCooper: for i386?
11:16 hakzsam: for all
11:18 MrCooper: k, sounds workable; does /usr/local/lib/i386-linux-gnu still need to be removed as well?
11:19 hakzsam: sadly, tes
11:19 hakzsam: *yes
11:25 hakzsam: it should be doable to remove that workaround if I find which packages break
11:26 hakzsam: I'm trying something else
11:29 MrCooper: it would be better to use local podman/docker for this experimentation, every new image generates 1.5 GB of egress
11:40 MrCooper: hakzsam: ^
11:41 hakzsam: yeah, but it needs qemu for testing
11:46 hakzsam: it's not always easy to test CI changes locally with podman/docker
11:46 MrCooper: qemu is in the image, what's the problem?
11:47 MrCooper: qemu-s390x /usr/lib/llvm-8/bin/llvm-config
11:51 hakzsam: I see no occurences of qemu when gitlab executes that job?
11:56 MrCooper: the runner hosts have set it up in /proc/sys/fs/binfmt_misc
11:57 hakzsam: okay, that makes sense
11:57 MrCooper: even if the qemu-user package happens not to be installed yet in the image you experiment with, just install it :)
11:58 hakzsam: yeah
12:06 daniels: debian binfmt-support + qemu-user-static packages btw
13:18 hakzsam: MrCooper: apt-get remove libicu63:i386 fixes the crash and I don't need to remove /usr/lib/i386-linux-gnu
13:20 hakzsam: still for weird reasons though :)
13:27 imirkin_: too much emoji for s390 to handle?
13:39 MrCooper: the crash happens in qemu x86 code, not in s390x code
13:40 MrCooper: hakzsam: hmm, does that remove other packages depending on libicu63:i386 as well?
13:41 hakzsam: yes
13:41 hakzsam: The following packages will be REMOVED:
13:41 hakzsam: libgphoto2-6:i386 libicu63:i386 libwine:i386 libxml2:i386 wine32:i386
13:42 MrCooper: have you tried only removing a subset of those?
13:42 hakzsam: yes
13:44 hakzsam: 'apt-get remove libgphoto2-6:i386 libwine:i386 libxml2:i386 wine32:i386' doesn't fix it, it's really libicu63
13:49 MrCooper: and installing only libicu63:i386 again makes it fail?
13:51 hakzsam: yes
13:53 MrCooper: sounds like we have a winner then :)
13:53 hakzsam: yeah, but it really makes no sense
13:55 hakzsam: hopefully my pipeline will pass now
13:56 MrCooper: yeah; I wonder if it could be some kind of virtual memory layout conflict between x86 and s390x maybe, and shuffling around shared libraries just makes it more or less likely to go kaboom
13:56 hakzsam: MrCooper: does it look reasonable to you to remove that package in the s390x job?
13:57 MrCooper: seems no worse than removing /usr/local/lib/i386-linux-gnu
13:57 hakzsam: right
13:57 ajax: what kaboom is this?
13:58 ccr: https://www.youtube.com/watch?v=t9wmWZbr_wQ
13:59 hakzsam: qemu crashes when executing llvm-config-8 for s390 https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/1893487
14:10 hakzsam: MrCooper: it crashed inside CI https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/1900710 :(
14:10 hakzsam: but works locally
14:40 hakzsam: (I was using an image for a WIP branch that was slightly different than the MR one)
14:51 vsyrjala: anyone interested in better dp aux debugs https://patchwork.freedesktop.org/patch/350386/?series=72485&rev=1 ?
15:05 hakzsam: oh well, RADV on 32-bit broken again https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/1901422
15:27 hakzsam: MrCooper: out of curiosity, who cares about building mesa on s390x architecture?
15:28 MrCooper: Red Hat (for RHEL)
15:29 hakzsam: ok
15:30 hakzsam: awesome, https://gitlab.freedesktop.org/hakzsam/mesa/pipelines/118603 it passed \o/
15:49 MrCooper: so the crash is random?
15:50 hakzsam: I think I just messed up my first experimentation because it wasn't the same image
15:51 hakzsam: just retried the job to check for randomness
15:52 MrCooper: ah, phew
15:53 hakzsam: yeah...
16:27 cmarcelo: anholt (or anholt_): can you take another look at the combine barriers pass MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224
16:49 anholt_: cmarcelo: done
16:51 cmarcelo: anholt_: tks
17:57 MrCooper: jekstrand: some notes: 1) AFAIK KMS atomic already supports explicit sync (via in-fences and an out-fence)? 2) exclusive vs shared fences do not mean write vs read with amdgpu 3) Last Phoronix benchmarks showed gnome-shell Wayland with compositing on par with Xorg with direct scanout, so I wouldn't worry too much about that :)
17:58 jekstrand: MrCooper: 1) Oh, good to know. 2) Yes, I know; amdgpu needs fixing. I called that out in my mail 3) That's good but blits are still bad. :-)
18:02 MrCooper: anholt_: is the db410c runner well? https://gitlab.freedesktop.org/daenzer/mesa/-/jobs/1903338 has been waiting for 40 minutes
18:05 anholt_: https://gitlab.freedesktop.org/admin/jobs?scope=pending says we're backed up on db410c. We're down to just one in the old docker setup because it kept bit-rotting. my plan was to land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076 today which gets us 7.
18:06 anholt_: cool. cmake, disaster that it is, ignores $PATH when looking for cc and c++.
18:07 MrCooper: jekstrand: you wrote "amdgpu [...] still implicitly syncs sometimes due to it's internal memory residency handling which can lead to over-synchronization", but the different fence semantics are worse than that I'm afraid; in particular, waiting only for the exclusive fence to signal isn't enough with amdgpu to make sure you read up-to-date buffer contents
18:09 Lyude: mripard: poke - did you see my message about pushing https://patchwork.freedesktop.org/series/74412/ into drm-misc-next? I just wanted to run it by a maintainer first since it touches a couple of different drivers
18:09 jekstrand: MrCooper: You have to wait for all of them?
18:10 MrCooper: yes
18:11 jekstrand: Yeah, if you look at the back-and-forth between me and Konig, I think it's headed that direction.
18:11 jekstrand: In particular, I suspect that, before review is over, the ioctls will always wait on all fences and always signal explicit.
18:12 jekstrand: It's a bit bigger hammer than I'd like ideally but hopefully once we get sync_file plumbed everywhere, we won't have any of the over-sync.
18:12 jekstrand: It's also exactly no worse than what we have to do in Vulkan today to make this all work.
18:13 vsyrjala: i don't think the kms out fence is particularly good. it's created by the kernel during the atomic ioctl, and signalled when the flips latch. so more or less just another way to get the flip event. but for decent mailbox i would imagine you want a client supplied fb+fence which go together always, and the kernel just signals the fence when said fb is no longer in danger of getting scanned out
18:16 jekstrand: vsyrjala: Yes, we would need that for proper explicit sync in KMS
18:16 jekstrand: Also, don't we still have compositors out there that are not using atomic?
18:17 vsyrjala: is xorg a compositor?
18:17 jekstrand: yes
18:17 vsyrjala: no idea if there are still non-atomic wayland compositors
18:18 krh: vsyrjala: yea, the client should create a fence, pass to window system, window system passes it to kms to signal -> client gets wakeup directly from kernel
18:18 Lyude: vsyrjala: gnome-shell
18:19 Lyude: they're working on changing that, but no idea how long that will take
18:21 vsyrjala: krh: iirc you suggested this before too :)
18:21 daniels: yeah and we do literally have that protocol ...
18:22 krh: vsyrjala: nobody liked the idea of being able to create a fence that wasn't tied to a submitted hw job though :)
18:22 daniels: Weston implements half of it, need to get the out-fence bit reviewed
18:22 krh: "OMG the deadlocks!"
18:23 daniels: oh you still mean empty-fences, heh
18:23 daniels: jekstrand: regardless we do have explicit synchronisation in Wayland; at least Weston and Exosphere
18:27 jekstrand: daniels: Yes, I know. And since they use atomic, it may actually be correct.
18:29 jekstrand: daniels: I wasn't awayre, until today, that atomic actually supported sync_file
18:29 jekstrand: That makes the situation much less dire than I thought.
18:30 mdnavare: vsyrjala: For monitor range EDID parsing, i renamed to monitor_range, i have r-b from AMD, if you also ack this i can merge this
18:30 mdnavare: vsyrjala: https://patchwork.freedesktop.org/series/74542/
18:30 mdnavare: vsyrjala: https://patchwork.freedesktop.org/series/74541/ on intel-gfx and all CI clean
18:30 mdnavare: vsyrjala: Also I think we need to add get_vrr_support () in intel_dp_connector_helper_funcs that can be then called from drm_single_connector_helper_probe() after getting modes
18:30 mdnavare: vsyrjala: in i915, this function will call drm_helper to read the ignore msa bit from dpcd, check if monitor range is > 10 and set the intel_dp->vrr_support bit
18:30 mdnavare: vsyrjala: Does this approach sound good?
18:30 mdnavare: kazlaus: What do you think for this approach?
18:33 lynxeye: vsyrjala: Yea, out-fence seems to be defined oddly. Normally a fence says that we are done with the action that references a buffer, for KMS this should clearly be the time of flipping _away_ from the buffer, not flipping _to_ the buffer.
18:34 lynxeye: is someone actually using the out-fence right now? it seems utterly useless with this definition...
18:34 jekstrand: Yeah, we really need two out fences. One for "we're going to be done with it soon" and one for "actually done"
18:35 jekstrand: If we want real implicit sync with a GPU driver, we very much need an out fence that's "actually done" or else it's useless.
18:36 jekstrand: Just more of the same "explicit fence that tells userspace to implicit fence" nonsense that we have with shmfence in X11
18:36 jekstrand: *Otherwise it's just more...
18:36 lynxeye: jekstrand: what do you need the "done with it soon" fence for?
18:37 jekstrand: That lets us communicate to userspace that the buffer is about to be free and it can use it as it's next buffer and start building GPU work to render to it.
18:38 jekstrand: THat GPU work has to be blocked, of course, on the "actually done" fence
18:39 lynxeye: as long as we still don't allow to specify fences out of thin air, getting a out-fence from KMS should only be possible via a atomic commit. A successful commit then means that the buffer will be done in bounded time (after the next few vblanks pessimistically).
18:40 krh: "done with it soon" doesn't sound like something that needs to be a fence... a protocol event more likely
18:40 jekstrand: Yeah, it could totally be a protocol event
18:40 jekstrand: And it is. It's wl_buffer::done :-)
18:40 jekstrand: or maybe it has a different name? Busy?
18:40 jekstrand: I don't remember
18:42 krh: XClientMessageEvent
18:44 jekstrand: But if that's all we get from out out-fence in atomic, then it only implements half of what we need for explicit and we're still doomed. :-(
18:48 krh: jekstrand: yea, not useful
18:49 jekstrand: MrCooper: I now recind my comments about how everything is ok. :-P
19:00 daniels: KMS fences are fine, you just need to track state back more deeply
19:00 daniels: the fence means your previous configuration is now inactive
19:00 daniels: which means you can release any buffers used in your old config but not your new
19:00 krh: daniels: tell that to execbuffer2
19:01 jekstrand: does it actually mean "scanout on the old buffer is done" or just "we accepted the flip"?
19:04 daniels: jekstrand: the former
19:04 daniels: accepted the flip is useless, because that's when the ioctl returns
19:04 jekstrand: Ok, if it's true that the scanout HW will never touch the buffer after that fence is signalled, then it's fine.
19:04 jekstrand: And we're not doomed
19:05 krh: jekstrand: it won't touch *the previous* buffer
19:05 daniels: we have a ton of typing to do but we're not doomed
19:05 daniels: right
19:05 jekstrand: Oh, then that's ok
19:06 jekstrand: That's just bookkeeping in the compositor
19:06 jekstrand: Which is fine
19:06 jekstrand: When it scans out a new buffer, it can just grab the fence and stick it on the buffer return path for the old buffer.
19:06 jekstrand: It may have to merge the sync_file with one from previous compositing work or somesuch
19:06 jekstrand: But that's fine
19:08 daniels: Wayland compositors have to do that bookkeeping anyway if they want direct scanout
19:08 jekstrand: Yup
19:09 jekstrand: We're back to non-doomed. :-)
19:09 daniels: you should've asked :P
19:09 daniels: I know ndufresne had some thoughts on V4L2 as well
19:10 vsyrjala: kernel would need to do bookkeeping too if we want mailbox
19:11 daniels: the bigger problem with that is making sure we don't paint ourselves into a corner with Vulkan present timing + actual real window systems
19:14 jekstrand: daniels: When I say "non-doomed" I am, of course, referring to Wayland only. X11 is still doomed. :-P
19:15 vsyrjala: also the single out fence is a bit of problem if you're doing a maibox flip with multiple planes, and some have an already queued flip and some don't
19:15 jekstrand: vsyrjala: You can merge the fences
19:17 daniels: jekstrand: yeah but that's axiomatic tbf
19:17 daniels: vsyrjala: that, err, doesn't sound very atomic to me
19:17 vsyrjala: why not?
19:18 daniels: if we're introducing per-plane variability then we also need to signal back exactly which plane states made it and which didn't
19:18 daniels: also, bonus points if you can explain how this should meld with VRR
19:19 emersion: speaking of VRR… https://github.com/swaywm/sway/issues/5076#issuecomment-596523258
19:19 gitbot: swaywm issue 5076 in sway "Automatic VRR management" [Enhancement, Proposal, Open]
19:20 vsyrjala: i guess we end up just doing: more than one plane in your mailbox flip? -> -EINVAL
19:20 emersion: i might send an email to te list later, but: could the kernel avoid flickering VRR screens?
19:22 jekstrand: daniels: True
19:26 jekstrand: daniels: Any work happening in gnome-shell that you know of to implement explicit sync?
19:26 jekstrand: daniels: Or kwin?
19:27 vsyrjala: emersion: imo driver/hw bug if the thing flickers
19:28 emersion: hmm
19:28 emersion: amdgpu devs said this should be handled by user-space
19:28 vsyrjala: or crap display (which could be considered a hw bug too i guess)
19:28 emersion: it's a "normal" behaviour for VRR screen to flicker
19:29 emersion: (when changing too fast the refresh rate)
19:29 vsyrjala: sounds like crap hardware then
19:29 emersion: it only happens on higher-end VRR screens (because they have a wider refrsh rate range)
19:29 emersion: it's a physical limitation
19:30 vsyrjala: sounds like something the kernel driver could handle by enforcing a slew rate, if the hw can't do it properly
19:31 emersion: yeah, that's what we want to do
19:32 emersion: i'd prefer to do it in the kernel, but some say user-space should do it
19:32 vsyrjala: they're just lazy kernel devs
19:32 emersion: i'll just send an e-mail to the list
19:33 emersion: this has lead to a pretty annoying situation: blacklist of user-space apps not rendering at regular intervals in mesa
19:33 emersion: (you can find e.g. firefox in the list)
19:34 vsyrjala: i think intel hw has a programmable slew rate in hw actually
19:34 emersion: oh, that would be very *very* nice
19:34 emersion: and a very good reason to do it in the kernel
19:35 jekstrand: Yeah, the kernel can restrict the rate of change of the refresh rate
19:35 jekstrand: At least it should be able to do so fairly easily
19:35 mdnavare: emersion: Yes on intel HW there is a limitation on how fast the vrefresh can change between frames
19:35 emersion: xcellent
19:35 emersion: excellent*
19:38 vsyrjala: the problem might be how to come up with those numbers. dunno if there's anything in edid/dpcd to give us a hint or do we just have to take a pessimistic guess
19:38 emersion: i don't think there's anything in the EDID
19:38 emersion: unfortunately
19:39 mdnavare: emersion: There is a register we need to program for : This register defines VBLANK maximum shift allowed between successive frames.
19:39 emersion: cool
19:40 mdnavare: emersion: Smallest Vblank = Previous Frame Vblank End - Vblank Max Shift Decrement. or largest vblank but need to figure out how to obtain these values for programming VRR in atomic commit before enabling VRR
19:41 emersion: indeed
19:41 emersion: mupuf suggested having DRM properties to set this up
19:42 mdnavare: emersion: properties even for the maxshift?
19:42 emersion: best would be to ask someone who knows how VRR screens work
19:42 mdnavare: emersion: I though we ust decided on properties for vmin and vmax supported as advertised from the edid
19:42 emersion: mdnavare: https://github.com/swaywm/sway/issues/5076#issuecomment-597028414
19:42 gitbot: swaywm issue 5076 in sway "Automatic VRR management" [Enhancement, Proposal, Open]
19:42 vsyrjala: i think what we need to do is pick a safe default, and then maybe allow the user to optimize it ia props
19:43 emersion: maybe the limit doesn't change a lot between screens
19:43 emersion: mdnavare: yeah not sure about exposing vmin/vmax via a prop
19:44 emersion: anyway, a prop is for bonus points
19:44 emersion: having something that doesn't flicker would be the first thing to aim for
19:44 mdnavare: emersion: mupuf suggests a prop for vrr_max_shift, vrr_min and max
19:45 emersion: yeah
19:45 emersion: users culd just override the EDID to adjust vrr_min/vrr_max
19:45 emersion: and arguably if they need to do it it's a broken EDI
19:45 emersion: D
19:46 jstultz: curious, are the mesa lima tests expected to work against the 20.0 branch right now? that one failed for me and trying to understand if its blip or not..
19:46 mdnavare: emersion: I think our hw will handle this properly since we progam vrr min and max so if the render rate suddenly drops from 60 to say 0, the vblank will terminate at the max vblank and flip will happen at the lowest refresh rate
19:47 emersion: this will flciker, mdnavare
19:47 emersion: flicker*
19:47 emersion: but with a slew rate it'llbe fine
19:47 mdnavare: emersion: why would this flicker?
19:48 emersion: mdnavare: see slide 19 https://xdc2019.x.org/event/5/contributions/331/attachments/424/675/HarryWentland-Freesync-20191003.pdf
19:53 mdnavare: emersion: static flicker due to luminance drop?
19:54 emersion: i've mainly noticed ynamic flicker
19:54 emersion: i think static flicker can just be fixed with a reasonable vrr_min
19:55 emersion: dynamic*
19:59 eric_engestrom: jstultz: they should work; is it the cache_test failing? it's known to be flaky, there's work going on to fix it. if it's something else, can you raise an issue and tag me (@eric)? I\ll look at it tomorrow
20:00 jstultz: eric_engestrom: I think its dEQP-GLES2.functional.default_vertex_attrib.float.vertex_attrib_4f,Fail ?
20:01 jstultz: eric_engestrom: https://gitlab.freedesktop.org/john.stultz/mesa/-/jobs/1905004
20:01 jstultz: eric_engestrom: though i don't know how the changes in my MR would affect that.. (its all android build fixes)
20:03 anarsoul|2: jstultz: IIRC there was an MR to disable this test on lima in 20.0
20:03 anarsoul|2: rellla: ^^
20:03 anarsoul|2: jstultz: just restart the job, it's a flake
20:03 jstultz: anarsoul|2: ah! thanks!
20:12 eric_engestrom: jstultz: could you target that MR for staging/20.0 instead? that's the working branch, while 20.0 contains the final results
20:13 jstultz: eric_engestrom: sure! sort of figuring out the process here, sorry!
20:13 eric_engestrom: no worries :)
20:14 eric_engestrom: looks good to me btw, if dcbaker hasn't merged it tomorrow or commented on it I'll merge it
20:14 eric_engestrom:logs off for the day
20:19 emersion: okay, sent a mail to dri-devel https://lists.freedesktop.org/archives/dri-devel/2020-March/258940.html
20:24 airlied: mripard: did the backmerge yesterday, forgot to push it out, pushed it our now
20:35 Venemo: jekstrand: would it make sense to you to add a cluster size to NIR's shuffle intrincsic?
20:35 jekstrand: Venemo: What would it mean?
20:37 Venemo: jekstrand: for example, a quad broadcast with a dynamic index would be equivalent to a shuffle with a cluster size of 4. basically the indices would mean the given index inside the cluster
20:37 mdnavare: emersion: Thanks SImon, I will also respond to that
20:37 emersion: thanks!
20:38 emersion: by the way, ping me if you need any user-space implementation for a new DRM prop
20:38 mdnavare: emersion: yes absolutely
20:38 robclark: huh, this doesn't look right.. maybe a think nir_validate should catch...
20:38 robclark: vec1 32 ssa_364 = imadsh_mix16 ssa_73, ssa_93.x, ssa_363
20:38 robclark: vec4 32 ssa_365 = mov ssa_364.xyzw
20:38 mdnavare: emersion: Currently we are just workin on attaching the already existing DRM props in i915 then will ad the vrr min and max prop
20:38 jekstrand: Venemo: Hrm... So it would implicitly do a idx = idx + (gl_SubgroupInvocation & (cluster_size -1))?
20:38 emersion: mdnavare: looks good
20:39 jekstrand: robclark: Yeah, that's wrong.
20:39 jekstrand: robclark: nir_validate doesn't catch that?!?!?!?
20:39 mdnavare: emersion: Just merged the patches to drm-misc for parsing EDID for this vrr min-max
20:39 robclark: doesn't seem to.. which surprised me a bit
20:39 jekstrand: Yeah, that seems off
20:39 robclark: I guess fixing nir_validate will make it easier for me to track down where this comes from
20:40 jekstrand: yeah
20:40 emersion: yeah, saw that, good thing you've talked to amd devs to use the same functions
20:40 jekstrand: robclark: validate_alu_src looks like it should be catching that
20:40 Venemo: jekstrand: yes. the reason I'm asking is that we have some hw where the hardware can only shuffle within 32 threads (not all 64), so we need some hacks to make those into a 'full' shuffle. now, those hacks are not needed when it's a quad broadcast. but there is no way to distinguish between the shuffle lowered from a quad broadcast and a 'full' shuffle.
20:41 jekstrand: Venemo: I see
20:42 jekstrand: Venemo: On said hardware, does the 32-thread shuffle act as a shuffle which only works in some cases or as a clustered suffle with a cluster size of 32?
20:42 robclark: hmm, yeah, asrc->src.ssa->num_components==1
20:43 Venemo: jekstrand: on said hardware, the indices are meant as modulo 32 within the 'cluster'. so on threads 0-31, the index 32 is equivalent to 0, 33 is equivalent to 1, etc.
20:44 jekstrand: Ok, so it's within a cluster
20:44 Venemo: yes. specifically, I'm talking about the ds_bpermute instruction, which works like this on Navi.
20:45 jekstrand: Venemo: The tricky bit here is that then you'd have to tell nir_lower_subgroups exactly what your shuffle cluster size is.
20:45 jekstrand: Unless you want it to just give you a cluster 4 shuffle?
20:45 Venemo: the default would be 0, meaning 'all', and when lowering the quad broadcast it would be 4.
20:45 jekstrand: If you can handle a cluster size of 4 in the back-end, why not just disable that bit of lowering?
20:46 robclark: oh, my bad.. I had NIR_VALIDATE=0 in my env
20:46 jekstrand: robclark: :-P
20:47 Venemo: I guess the real question is, should we lower these dynamic quad broadcasts in the backend or not.
20:47 Venemo: if we do, then sure I can emit whatever works best on the hw
20:49 Venemo: but then that requires different special casing in the backend
20:49 Lyude: agd5f, skeggsb, j4ni: jfyi I'm just going to go ahead and push https://patchwork.freedesktop.org/series/74412/ into drm-misc-next, wanted to let you guys know since it's touching a couple drivers. I was hoping to at least get an ok from one of the drm-misc maintainers but I haven't been able to get in contact with any of them on irc :S
20:50 robclark:suspects that lowering a vec4 imul to imadsh_mix16 which is defined as scalar goes badly
20:50 robclark: otoh imul is probably a thing that I don't want to be vector..
20:53 bnieuwenhuizen: jekstrand: how are you implementing submits that do not wait on an implicit exclusive fence?
20:54 jekstrand: bnieuwenhuizen: i915 is more decoupled than amdgpu
20:54 jekstrand: bnieuwenhuizen: We have an EXEC_OBJECT_ASYNC flag which says "yeah, this is a dma-buf; ignore synchronization on it"
20:55 bnieuwenhuizen: jekstrand: decoupled in what sense?
20:55 jekstrand: bnieuwenhuizen: ickle could tell you more about what that does inside i915
20:55 jekstrand: bnieuwenhuizen: We don't have to synchronize execution in order to have a BO resident
20:57 bnieuwenhuizen: jekstrand: how do you prevent moving the buffer before all execbuffers writing to it have finished?
20:58 jekstrand: bnieuwenhuizen: That's a question for ickle, I'm afraid.
20:59 robclark: jfwiw, the issue was with definition of imadsh_mix16.. letting that be vector fixed things.. Or, well maybe nir_opt_algebraic shoudl somehow know that it can't lower vector things to scalar things..
20:59 jekstrand: bnieuwenhuizen: For one thing, buffers rarely ever "move"
20:59 ickle: moving buffers around is a kernel problem, userspace asking nicely to not wait isn't taken into consideration
20:59 jekstrand: bnieuwenhuizen: That only happens if we swap to disk and if you're doing that, you're dead anyway.
21:00 jekstrand: The rest of the time, we can go ahead and do stuff async.
21:00 bnieuwenhuizen: jekstrand: until your working set does not fit in VRAM (yes yes, Intel does IGPs, but for how long ...)
21:00 jekstrand: bnieuwenhuizen: Sure. Then it'll have to swap the working set in and out
21:00 jekstrand: But even there, it's not as simple as "wait for the gpu to be done" because we have perremption
21:01 jekstrand: So the kernel can preempt, move the buffer, set up the new page tables, and continue execution and the userspace batch shouldn't notice.
21:01 Lyude: airlied: you ever seen this before? https://paste.centos.org/view/4f0df104 was trying to create the topic branch you asked for with the mst bw regression fixes, but I can't seem to push it
21:01 agd5f: Lyude, sounds good to me
21:03 airlied: Lyude: wierd never seen that
21:04 Lyude: airlied: and that command I'm using looks sensible, right? I remember last time I had problems it was just me using dim wrong
21:04 agd5f: Lyude, disk space might be an issue on fdo
21:04 Lyude: daniels: ^ any idea btw?
21:04 Lyude: ooh, yeah that would explain it :s
21:04 agd5f: I think I ran into that a few months ago
21:10 Lyude: airlied: want me to push this branch somewhere else (fdo gitlab maybe?) or would you rather we just wait for the fdo admins
21:17 Venemo: jekstrand: so your suggestion would be to just lower this in the backend?
21:19 jekstrand: Venemo: I don't know that it's a strong suggestion
21:19 jekstrand: But it seems like you can do the lowering in the back-end just as easily as you can handle a cluster=4 shuffle
21:19 jekstrand: Unless I'm missing something?
21:20 Venemo: jekstrand: it's probably not that big of a deal either way. but if there were other people who thought adding the cluster size to the shuffle helps them too, then I'd lean towards that.
21:21 jekstrand: For us, it doesn't help. We'd just want nir_lower_subgroups to lower it away to a normal shuffle
21:21 Venemo: okay
21:21 airlied: Lyude: let me dig into it a bit,
21:22 jekstrand: Venemo: The one way it might help is it'd let us drop an opcode
21:22 jekstrand: Venemo: If we had a cluster size parameter, I'd be inclined to do the lowering in spirv_to_nir and not have a separate quad shuffle opcode.
21:23 airlied: Lyude: pushing to wrong tree
21:23 jekstrand: Wait... No, we'd need a cluster broadcast opcode not shuffle
21:23 airlied: should be pushing to drm-misc-next tree I think
21:23 airlied: not sure if we dim does drm-misc topic branche
21:23 Lyude: ah, I think it should
21:23 airlied: Lyude: dim create-branch drm-misc/topic/mst-bw-check-fixes
21:23 airlied: might do it
21:24 Lyude: airlied: bingo
21:24 Venemo: jekstrand: for us, it's not an easy answer, because some hw doesn't support shuffle at all (I have a MR which adds this case to nir_lower_subgroups). the other consideration is that in some cases we could emit more efficient code if we lowered it ourselves rather than in NIR.
21:26 Venemo: jekstrand: I have a patch in MR 4147 which addresses hw without shuffle. does that patch look sane to you?
21:29 Venemo: jekstrand: w.r.t adding the cluster size to shuffle: if it doesn't help in the grand scheme, then let's drop the idea.
21:31 jekstrand: Venemo: Ok
21:37 Lyude: airlied: ok, branch should be ready at topic/mst-bw-check-fixes-for-airlied (I'm assuming I don't need to do the dim pull-request step since I'm poking you on irc)
21:45 Venemo: jekstrand: don't worry, we don't want to do that for shuffle
21:49 Venemo: jekstrand: on old hw, we simply don't support shuffle. we might do something about it in the future but it's just not possible to make an efficient implementation on these
21:50 jekstrand: Yeah
22:03 airlied: Lyude: you do :)
22:04 airlied: so it can go into patchwork and I can get all the nice lin
22:04 airlied: links
22:18 Lyude: airlied: ah, got it
22:31 Lyude: airlied: gah, does an email for this need to be sent out? just finished running dim pull-request, but after pushing the git tag it broke with fatal: Not a valid revision: drm-misc
22:31 anholt_: db410c should no longer be blocking people's pipelines, as we now have 7 of them running instaed of 1. let me know if you see instability on those (a306 jobs).
22:32 airlied: Lyude: dim pull-request <branch> origin/master ?
22:32 airlied: Lyude: oh I can probably pull without an email but it's painful
22:33 Lyude: airlied: that started pulling in a bunch of patches do I don't think that will work, let me try something else
22:33 airlied: Lyude: is it branched off master or drm-next?
22:33 airlied: oh but it's for fixes
22:34 airlied: I can fast forward fixes to rc5 at least
22:34 Lyude: airlied: drm-misc/drm-misc-next I think, also using that as the baseline seemes to have worked
22:35 Lyude: oh-whoops no the time it was pulling in the extra patches I tried origin/master like you asked
22:35 airlied: Lyude: yeah that isn't a good baseline
22:35 airlied: it needs to baseline from v5.6-rc5 or drm-fixes (which is now that)
22:35 airlied: otherwise I'll get all the next stuff in fixes
22:35 Lyude: airlied: gotcha
22:48 Lyude: airlied: ok-I deleted the old topic branch, about to make a new one with the correct baseline, but unfortunately I'm still a little confused :(. So, we want the baseline for the pull request to be drm/drm-fixes, but drm-fixes and drm don't work as the baseline with dim create-branch ${baseline}/topic/mst-bw-check-fixes-for-airlied. Am I supposed to use two different baselines?
23:04 Lyude: airlied: I've gotta head out now but i'll take a look at your response when I get back
23:22 airlied: Lyude: let me go look at dim
23:23 airlied: dim create-branch drm-misc/branchname drm/drm-fixes seems like it should work