IRC Logs of #dri-devel on irc.freenode.net for 2023-04-10

01:01 robclark: DemiMarie: I mean pinning all the memory would make things simpler but would be quite suboptimal if you don't consider infinite RAM as a for-free thing
01:02 DemiMarie: robclark: this is where I wish graphics supported recoverable page faults
01:04 robclark: hmm, some things can.. but would perhaps not be optimal.. I mean page fault on cpu stalls one thread.. page fault on gpu stalls perhaps 100's of threads
01:04 robclark: so even if we *could* do it (depending on hw and gen) doesn't mean we *should*
01:58 luc: Hi, I'am trying to understand the disscussion on dma_fence in the channel and encountered a question that if 'no allocation in the scheduler path' is a rule, I wonder if this might get stuck https://elixir.bootlin.com/linux/v6.3-rc6/source/drivers/gpu/drm/panfrost/panfrost_job.c#L93
03:22 robclark: luc: that could be problematic but from a quick look panfrost shrinker like it doesn't wait on fences.. so probably too primitive to run into problems.. also the count_objects impl iterating shrinker_list under a device global lock is going to mean system is unusable under sufficient memory pressure before you hit the reclaim deadlock problem
03:22 robclark: so basically not a problem simply because you have bigger problems ;-)
05:17 mareko: cwabbott: I can only see block-level dominance, which is used by nir_opt_cse
05:21 mareko: cwabbott: there is some interesting stuff we could do with instruction-level dominance: if an instruction dominates 2 outputs, the instruction result can become a new output, and the dominated instructions computing the 2 outputs can be moved into the next shader; also, if an instruction post-dominates 2 inputs, the instruction resuilt can become a new input and the post-dominated instructions can be
05:21 mareko: moved into the previous shader
05:21 mareko: *result
07:42 tomeu: David Heidelberg: not sure, but I would expect to have picked it up from the tree of a freedreno dev?
09:10 hakzsam: https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/852553 -> stoney jobs are way too long
10:06 daniels: hakzsam: yeah, that's hopefully being fixed this week
10:06 hakzsam: cool
12:25 luc: what if a job using the entry BO is not yet finished in 2 seconds? https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/panfrost/lib/pan_bo.c#L249
14:49 mareko: forking nir_dominance.c and replacing nir_block with nir_instr did the job :)
15:08 zmike: mareko: any other comments on my glthread MR? would like to get that merged before branch
15:53 cwabbott: mareko: uhh, no, that would be a terrible idea
15:53 cwabbott: don't do that
15:54 cwabbott: I'm not at a computer now to tell you the exact name but we have something to tell you if an instruction dominates another
15:56 cwabbott: replicating all the dominance stuff per instruction when dominance within a block is trivial would be... not smart
17:34 tomeu: luc: don't know if it has changed since (it may have because of OpenCL), but it used to be that such a job would have timed out in the kernel by then
17:53 DavidHeidelberg[m]: mareko: could ` mesa: Enable NV_texture_barrier in GLES2+ ` introduce failure in SKQP test: gles_lcdblendmodes ?
17:54 DavidHeidelberg[m]: precisely flake failure
17:54 DavidHeidelberg[m]: asking, because in this MR the flake was first seen
17:56 tomeu: I'm a bit lost trying to figure out what is lowering my clz to ufind_msb
17:57 tomeu: the clz is in the spirv, but something is lowering it to ufind_msb even if I have .lower_uclz = false,
17:58 tomeu: and I cannot find what code is doing that, even if such an optimization is in nir_opt_algebraic.py
17:58 tomeu: but I don't see it in nir_opt_algebraic.c
17:59 tomeu: my GPU does have a CLZ instruction, that's why I would prefer not to have that lowering
17:59 jenatali: tomeu: Have you tried NIR_DEBUG?
18:03 tomeu: jenatali: ufind_msb is already there in the first dumped shader
18:04 tomeu: it is as if it was done by vton, but I haven't found the code that does it
18:04 jenatali: Huh weird
18:04 tomeu: oh crap, I see now what is going on
18:05 tomeu: nir_clz_u isn't really emitting clz()
18:05 tomeu: it is as if the lowering pass had been moved up to vton
18:06 tomeu: I assumed that nir_clz_u would be emitting nir_op_uclz
18:06 tomeu: airlied: that seems to be your deed, do you remember why did you go that way?
18:07 zmike: he's gone on holiday for some weeks
18:12 DavidHeidelberg[m]: sorry mareko the ping about `mesa: Enable NV_texture_barrier in GLES2+` was mislook, it should be for ajax
18:19 robclark: DavidHeidelberg[m]: it does seem highly likely that NV_texture_barrier could be implicated in gles_lcdblendmodes flake.. was it already marked flakey for the gl_lcdblendmodes version of the test
18:22 DavidHeidelberg[m]: cmarcelo: could you stop and postpone run of the jobs radeonsi-raven-* from your pipeline?
18:23 DavidHeidelberg[m]: we currently having outages and I need to verify some flakes hitting main mesa
18:23 DavidHeidelberg[m]: outage (2 from 5 machines is available :( )
18:24 cmarcelo: DavidHeidelberg[m]: sure
18:24 DavidHeidelberg[m]: Thanks 🙏
18:27 cmarcelo: DavidHeidelberg[m]: I think I managed to cancel all of them.
18:30 DavidHeidelberg[m]: yup, thank you!
18:50 cmarcelo: anyone familiar with the venus-lavapipe CI job. I'm trying to reproduce a failure locally (via vtest), but not clear I have the setup right. am I supposed to be able to test it using the vtest bypass as described by https://docs.mesa3d.org/drivers/venus.html#vtest -- is that a correct way to reproduce that setup, or am I missing something?
18:52 cmarcelo: in particular the vulkan software implementation (I'm assuming this is what is called "lavapipe") doesn't seem to be in the loop, as the virgl_test_server seems to use an underlying GL implementation.
19:21 anholt: cmarcelo: I've successfully tested with vtest using those instructions. virgl_test_server does probably do some gl work at startup, but if you're running a vk cts test, then I don't see how you'd end up with anything other than vulkan doing real work?
19:22 anholt: cmarcelo: note that vtest won't be exactly the same, since venus-lavapipe is running an actual VM (crosvm-runner.sh). but most likely any refactor you're doing would be reproducible across just vtest.
19:30 cmarcelo: anholt: I guess I was misled by the gl work at startup. it makes more sense now.
19:31 cmarcelo: I still don't get the failure itself but the vtest server side seems unhappy when executing it: vtest_resource_create_blob called virgl_renderer_resource_export_blob which failed (-22)
19:33 cmarcelo: going deeper turns out some VIRGL_RENDERER_BLOB_FD_TYPE_OPAQUE when handling the export blob case. wondering if I'm hitting a limitation of what vtest can do here.
19:34 cmarcelo: fact that it happens also in the main branch makes me thing that's the case.
19:37 cmarcelo: and... test passes if I use vulkan software impl directly. :(
19:43 cmarcelo: anholt: how experimental is venus? trying to figure out if this is a case of adding a skip and move on or keep digging?
19:45 DavidHeidelberg[m]: cmarcelo: one lead can be CI failure rate for venus jobs 😉 but it's getting more stable recently
19:47 cmarcelo: DavidHeidelberg[m]: how can I see this?
19:48 cmarcelo: MR is currently being blocked by this venus job failing, so I was assuming it was stable/passing
19:50 DavidHeidelberg[m]: cmarcelo: when you open issues, filter by CI tag (usually most recent report is in "open", older in "closed")
19:50 DavidHeidelberg[m]: s/CI/CI Daily/
19:51 DavidHeidelberg[m]: Yeah, currently it should be fairly stable, if you repeat the job run and it still fails, it's probably your mistake :D
19:55 cmarcelo: anholt: in https://docs.mesa3d.org/drivers/venus.html the instructions for using crosvm depends on having a valid image, is there an easy way for me to reproduce locally the image CI uses for this?
19:56 anholt: cmarcelo: top of the job log should have a fetching of the rootfs, I'd just download that and unpack it to find the image ci is using.
20:01 cmarcelo: anholt: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/39345595 ~> "fetching of the rootfs" do you mean look in the docker image used by CI, or the URL in MESA_IMAGE (line 786)?
20:03 anholt: cmarcelo: ah, right. the docker container's what I actually mean (harbor.freedesktop.org/mesa/mesa/debian/x86_test-vk:2023-04-02-piglit-2391a83d--2023-03-27-virglrenderer-crosvm--d5aa3941aa03c2f716595116354fb81eb8012acb). I've been reading too many lava logs.
20:13 daniels: tomeu: looks like nir_op_uclz was added in an MR which was written before SPIR-V CL, but SPIR-V CL got merged first; when the MR to add op_uclz was finally ready it was only ever hooked up for GL and never fixed the vtn path
20:28 ESkilton: Learn all about Richard Simmons getting dateraped by Regina native Edna Skilton! https://pastes.io/3zb8ipyqbv
21:18 ESkilton: Geddy Lee has hot sex with Mrs. Skilton and also tries black dick for the first time! https://pastebin.com/1ExdrDQA
21:28 ESkilton: Hulk Hogan gets threesomed by Mrs. Skilton and her Black Husband. https://pastebin.com/eMbnzSXF
21:29 kchibisov: daniels:
21:30 daniels: kchibisov: I don't have ops here
21:30 kchibisov: Sad.
21:30 daniels: robclark or MrCooper might be around
21:31 i509vcb: irccloud doesn't really show who is an op here
21:31 kchibisov: i509vcb: that's because you ask ChanServ to get op or something.
21:33 daniels: yeah, /m chanserv access #dri-devel list
21:35 i509vcb: yeah it seems like that, I remember a fun drama event where I had someone saying someone didn't have op and no one know until the *!*@* ban was whipped out
21:35 daniels: robclark: thanks
21:35 robclark: np
21:45 DavidHeidelberg[m]: mareko: imho the MR breaks previously working GLES on raven, why should I add flake, when it'll keep to be broken for raven? Some stuff probably start being flaky broken
21:45 DavidHeidelberg[m]: would make sense disable it for affected HW until resolved, but adding flake to code which worked just fine before this feature landed seems really wrong
21:56 robclark: if that skqp test was failing 80% of the time that would probably be problematic IRL (chrome/ium heavily uses skia and skqp is part of android cts)
22:09 DavidHeidelberg[m]: also LibreOffice uses Skia
22:10 DavidHeidelberg[m]: thou probably not much often on ES2+
22:58 mareko: re-assigned to marge
23:59 mareko: cwabbott: ssa_def_dominates() uses block dominance in combination with the instruction index to determine which instruction is first, which is ok for some uses and very fast, but it's not true dominance of the SSA def graph. A true SSA def dominance would have to use nir_foreach_src to walk the graph. Also, it would have to handle the case that an immediate dominator doesn't exist for loads that don't