IRC Logs of #dri-devel on irc.freenode.net for 2023-03-13

01:20 mareko: does SPIR-V have dual-slot dvec4 VS inputs?
01:22 pendingchaos: yes
01:22 pendingchaos: radv implements 64-bit vertex inputs
04:34 mareko: actually gl_spirv seems to do it correctly
06:13 mareko: interesting fact: the current varying linker doesn't fully remove VS inputs (the vertex buffers and elements are still set even if unused), but my new varying linker does remove them properly
06:16 mareko: and I wondered why a display list had no vertex attribs... well they were eliminated by the linker
10:00 jani: hey, I pushed v6.3-rc2 to drm-intel-fixes and got a conflict in cirrus when merging drm-misc-next
10:02 jani: tzimmermann: you were apparently involved in it, any chance you could resolve it?
10:03 tzimmermann: jani, give me a minute. there was a merge conflict during my cirrus push. working on it
10:03 jani: tzimmermann: ah, okay, it wasn't me then. thanks!
10:16 vsyrjala: fyi i915 doesn't currently build with WERROR=y. fix here: https://patchwork.freedesktop.org/series/115046/
10:27 tzimmermann: jani, should work now
10:40 jani: tzimmermann: thanks
10:59 jani: vsyrjala: did you fix it with a conflict resolution?
10:59 vsyrjala: i just pushed a fixup patch for it
10:59 vsyrjala: there is no conflict
11:00 jani: right, understood
11:09 mareko: apparently nobody in the industry does VS->TCS varying opts in the linker, otherwise they would have noticed that GLCTS contains a test that has a dead uniform only kept alive by a dead output
11:10 mareko: and it fails if it's eliminated
11:14 zmike: unsurprising
11:15 zmike: will you file an issue for it?
11:15 zmike: or should I
13:05 dwlsalmeida: Lynne hey, if you have the time, I wonder what you think about the provisional vp9 vulkan video api? airlied apparently got it to work on radv after some work of his own on top
13:44 LuckyKnight: Any one else have compilation error with Mesa V23.0.0?
13:44 LuckyKnight: 17:28:10 ../src/egl/drivers/dri2/platform_x11.c: In function ‘dri2_x11_get_msc_rate’:
13:44 LuckyKnight: 17:28:10 ../src/egl/drivers/dri2/platform_x11.c:1213:4: error: implicit declaration of function ‘loader_dri3_update_screen_resources’ [-Werror=implicit-function-declaration]
13:57 DavidHeidelberg[m]: LuckyKnight: seems like it's true: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8476
13:58 DavidHeidelberg[m]: aaand it's you.. hmm.
14:05 MrCooper: looks like loader_dri3_helper.h isn't getting included for some reason
14:07 psykose: it only gets included with dri3 enabled afaict
14:08 psykose: i guess it's some niche case where those options don't enable dri3 (which is weird) but are building the dri2 code
14:09 psykose: (the file itself only includes it with dri3 but won't compile without it?)
15:23 eric_engestrom: I encourage everyone who has jobs in the CI to take a look at how long they take and reduce the coverage (or increase `parallel:` and the number of runners for those who can) in order to keep it fast enough to not be too much of a burden :)
15:23 eric_engestrom: to lead by example: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21874
15:25 daniels: eric_engestrom: fwiw we've mostly been trying to keep ours at 15, assuming that the build will complete in several minutes
15:26 daniels: we've done a bunch of stuff to cut test load, and I'm waiting until things stabilise before I start throwing out some of the slower tests as that changes the caselist so needs a stress test
15:26 daniels: the biggest issue atm is that some of the Chromebooks have quite unreliable UART, so we need to retry far too often for my liking - we're waiting for a LAVA fix to land for that hopefully very soon which will cut the worst-case times in half
15:27 eric_engestrom: I didn't look at the docs recently but iirc the target was 10 minutes
15:27 eric_engestrom: if it was bumped to 15, perhaps we should bring it back to 10?
15:28 jenatali: I've been doing my best to keep to the spirit of that goal. If anybody notices the Windows jobs starting to be on the longer side of the job length spectrum please let me know and we can reduce coverage or something
15:34 Wallbraker: Speaking of CI, has one of the windows runner fallen off? Tags: docker/windows/2022 are not showing up.
15:35 Wallbraker: Ops wrong channel, reposting in #_oftc_#freedesktop:matrix.org.
16:08 bbrezillon: gfxstrand, danvet, robclark: I was discussing page table pre-allocation with robmur01, to guarantee that no attempt to allocate page tables in the run_job() path would deadlock with the shrinker, and robmur01 asked if we could allocate with GFP_NOWAIT, and simply return an ERR_PTR(-ENOMEM) if the allocation fails. This should result in the job finished fence being signaled right
16:08 bbrezillon: away, thus unblocking the shrinker.
16:08 bbrezillon: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_main.c#L1016
16:10 robclark: daniels, eric_engestrom: is there some metrics somewhere on how long various CI jobs take.. I thought there was some grafana thing but failing to find that now
16:11 daniels: robclark: look for the 'ci daily' label on issues - that shows which jobs routinely take the longest
16:11 robclark: bbrezillon: I was considering GFP_NOWAIT and just treating it as a gpu hang if it fails.. but that seems a be severe
16:11 robclark: daniels: thx
16:11 daniels: anything for stoney/volteer/sarien is going to get inflated since we regularly have to retry due to UART death
16:11 gfxstrand: bbrezillon: Yeah, it depends on just how bad of a corner you have to get wedged into for that to happen.
16:12 gfxstrand: That is the worst-case fall-back but we need to be sure it won't happen in practice unless you're in a OOM situation so bad that it's only a matter of time before the system crashes.
16:13 robmur01: gfxstrand: said pagetables are always regular kernel pages allocated one at a time, so failure does imply system-wide memory pressure in general (which is why I'm inclined to think it's OK)
16:16 robclark: the bad thing about VM_BIND going badly is you pretty much need to throw the whole context away and start over since it is an async error
16:16 bbrezillon: yeah, there's that
16:17 robclark: it's probably ok as a way to unblock things and more fwd.. but I kinda suspect it will be something to revisit
16:21 robclark: daniels: this is impressive.. https://gitlab.freedesktop.org/mesa/mesa/-/jobs/37077252 ... "Elapsed time: 20643 minutes 7 seconds"
16:27 gfxstrand: airlied: ^^
16:28 gfxstrand: So, one thing I've considered is to have a concept of a VA reservation that happens as a separate ioctl which would provide opportunity for the kernel to allocate pages for page tables. The guarantee would then be that as long as your VM_BIND happens within a reserved range, it can't cause a context loss due to OOM.
16:29 robclark: gfxstrand: that could always be done as sync part of the VM_BIND ioctl.. the problem really is just that robmur01 doesn't want to break the io-pagetable abstraction by letting the drm driver allocate it's pages ;-)
16:31 gfxstrand: robclark: Yeah, I've thought about that too.
16:31 gfxstrand: robclark: Yeah, if io-pagetable is your abstraction, I can see that being... tricky.
16:32 gfxstrand: CPU and GPU make very different assumptions here
16:32 robmur01: It's not that I don't want to per se, just that it would be fiddly since it's really not designed that way - the point was to abstract the details that callers shouldn't have to care about, so guessing how many pages a given mapping might need is probably far from straightforward
16:33 robclark: it's all just code. it can be changed.. but seems fine to punt on that bit and just use GFP_NOWAIT to start
16:34 robclark: robmur01: hand-wavey idea I had was that it should be possible to calculate an upper bound, and drm driver just makes sure there are at least that many pages in it's pool of pre-allocated pages in the sync part of the ioctl
16:34 robclark: so doesn't need to be exact, just worst-case figure that io-pgtable hands back
16:45 gfxstrand: Yeah, figuring out how many an allocation would need is fiddly. You can figure out a worst-case pretty easily if you know the page table structure. However, once you take various forms of hugepages into account, getting an exact count is impossible until you go to map. You can also end up increasing the PT count on unmap if you haven't reserved a worst-case amount if you unmap part of a hugepage.
16:48 robclark: right.. but at the cost of wasting some small # of pages I think it should be doable to just ensure there is always a worst-case amount available.. ie in the unmap path you can split at most two huge pages so just need that # of spare pages around
16:53 eric_engestrom: DavidHeidelberg[m]: feature request for ci_run_n_monitor: ability to pass `--pipeline 123` instead of `--rev $sha`, for when it fails to find the pipeline (or picks the wrong one) :)
16:58 danvet: bbrezillon, does this mean the job fails when userspace would have legitimately expected it to succeed?
17:10 gfxstrand: robclark: For a single unmap, yes. But you may have N unmaps queued which each split two hugepages.
17:13 robclark: yeah
17:13 robclark: the question is whether 2*N pages laying around is reasonable or not. I guess the upper limit would be worse in the map case
17:14 robclark: I haven't done the math yet.. but it _seems_ like there could be a reasonable upper bound
17:15 robmur01: yup, worst-case means preallocating somewhat more than 1/512 * N * M, where N is the maximum size of any mapping and M is the maximum number of mappings which may happen concurrently
17:17 robclark: that doesn't seem too bad.. and a lot more reasonable than trying to calc an exact figure
17:18 robmur01: given that it only matters in low-memory situations, holding on to that many pages which in practice we almost certainly don't need seems counterproductive :/
17:18 robclark: I guess you'll have a lot more pages of GEM buffers
17:19 robclark: but.. I think it would be ok to start with gfp flags to get something working as a first step
17:21 robmur01: not as if it's hard to reach OOM on typical things that run Panfrost, to test the real-world impact :)
17:23 robmur01: TBH once I get the freelist stuff hooked up in io-pgtable there will likely already be a temporary freelist in map, so recycling pages off that isn't much of a stretch.
17:25 robmur01: From there it's not too unthinkable to potentially pass in a non-empty "freelist" from outside, but what I really wouldn't want to have to deal with is synchronisation if said list is scoped any wider than per-call (like the freelist is in normal operation)
17:27 robclark: pretty much any chromebook with <= 4G of ram (plus zram swap) is doing lots of reclaim by the time you open up a average # of tabs
17:43 cwabbott: dschuermann_: here's another fun issue with divergence analysis - we have to use a vector register for loading ssbo's even when loading from a uniform offset, and it appears that at least on certain gens loads can "tear" if there's an earlier store without a fence in between, where some threads in the wave see the earlier value and some see the later value, even if all threads are loading from the same location
17:44 cwabbott: the result is that the output isn't actually uniform and bad things happen when we assume it is (in this case skipping reconvergence for a branch based on its value)
17:44 cwabbott: I hope that doesn't happen on amd
17:45 cwabbott: (spec@arb_shader_storage_buffer_object@execution@ssbo-atomiccompswap-int is the test affected, for reference)
17:47 cwabbott: I wonder if the test is wrong for not including a fence between the atomic op and the load
18:04 cwabbott: hmm, seems like vulkan memory model is much more strict here than the GLSL spec is
18:05 cwabbott: vulkan just says "Applications must ensure that no data races occur during the execution of their application." whereas GLSL has more hand-wavey language about stuff not becoming available
18:05 cwabbott: *about writes not becoming available
18:14 lumag: jani, excuse me, just wanted to check for any updates on dsc helper series validation? I can ping drm-misc maintainers to get 1-5 in, but I'd probably restrain myself from doing that if we can get more or if I'd need to update any of the patches.
19:28 jani: lumag: would be great to get other folks to review the various tables. I don't have the time right now
19:29 lumag: jani, I see
19:29 lumag: abhinav__, would it be possible for somebody to help with tables review?
19:32 lumag: jani, any other possible reviewers from Intel? AMD uses different approach, so they can not help us here.
19:42 zmike: DavidHeidelberg[m]: is it intended that virgl-iris-traces runs on all jobs? I just got one on a zink-only pipeline
19:44 anholt: zmike: it won't run, that's .test-manual-mr.
19:44 zmike: ah
19:44 zmike: tremendous
19:57 abhinav__: lumag QC will help with the tables in drm/dsc parts
19:58 abhinav__: for the i915/display/intel_vdsc.c part of https://patchwork.freedesktop.org/patch/525445/?series=114472&rev=2 , I would still prefer an intel reviewer
19:58 abhinav__: unless you want to split that change to a different one
19:59 abhinav__: and keep the i915 part confined to a separate change
20:01 lumag: abhinav__, it's not possible
20:02 lumag: but reviewing the rest would also be helpful
20:05 jani: lumag: maybe the folks involved in https://patchwork.freedesktop.org/series/114246/, e.g. aknautiy_
20:19 lumag: jani, ack, let's see if aknautiy_ responds