IRC Logs of #dri-devel on irc.freenode.net for 2023-04-11

00:00 mareko: have any srcs, and the dominance algorithm would have to insert a dummy node that dominates all such loads while building the dom tree to compensate. Also, nir_instr would need 6 new dom variables for dominance including 2 set data structures. (there are ways to store those variables outside of nir_instr) In a nutshell, there is no true instruction-level dominance in NIR and it would be very expensive
00:00 mareko: to use it everywhere, but there are a few niche cases where instructinn-level dominance is the only solution to the problem.
00:01 mareko: and I also need post-dominance, so I'll add both
00:02 mareko: and post-dominance would be a separate dom tree
01:33 cwabbott: mareko: first off, unless you're actually running into perf issues, that's very much premature optimization
01:35 cwabbott: second off, adding the full dominance info per instruction is very much overkill - instructions are linear, just use the nir_instr::index if you're going to be making a lot of those queries
04:23 mareko: cwabbott: this is not for any existing NIR pass; also, I need the exact SSA graph of instructions because I'll be moving it between shaders, as well as be able to analyze whether it's legal to move them, which is highly dependent on which instruction are in the SSA graph
04:25 cwabbott: all of that is irrelevant to what I just said
04:26 mareko: cwabbott: what did you say then?
04:26 cwabbott: that you shouldn't reinvent nir dominance for your new pass
04:27 mareko: cwabbott: consider this shader that's complex, but has this one line that's simple: "output = input0 + input1;" - the goal is to find such an expression and move "input0 + input1" into the previous shader and replace the line with "output = new_input;" - how am I going to do with the current dominance?
04:27 cwabbott: use the existing thing, or if it's too slow then use the instruction index or build your own
04:28 mareko: *do it with
04:28 cwabbott: umm, you don't need dominance at all for that
04:29 cwabbott: you're way over complicating it, if that's your goal
04:29 mareko: and I need to prove that "+" is legal to move across interpolation instructions
04:30 mareko: cwabbott: I could do DFS starting from all outputs, but the time complexity would be much worse than just creating a dom tree and doing a bunch of LCA queries for every pair of output components
04:31 mareko: and every pair of output components is already O(n^2)
04:31 cwabbott: find the addition, find the store_output instruction, add a new output that uses the source of those
04:32 cwabbott: uhh, no
04:32 mareko: I mean input components
04:33 mareko: so DFS would start from all inputs, looking for a post-dominator, and that would be the replacement input for everything preceding the post-dominator
04:33 cwabbott: it's literally just searching the shader for an add with two inputs as sources, why on earth would you have to do anything with dominance or whatnot
04:33 cwabbott: you're way over-complicating it
04:34 mareko: cwabbott: it can be an arbitrarily complex expression consisting of ALUs, uniform loads, UBO loads, constants, and multiple input loads
04:35 mareko: the goal is to find the farthest post-dominator of inputs and then prove that all instructions being post-dominated are movable across interpolation
04:35 cwabbott: if you want something more fancy, look at what opt_preamble is doing
04:36 cwabbott: it also moves an arbitrarily complex expression into the preamble
04:36 cwabbott: still, no dominance or whatever is necessary
04:36 HdkR: It's the ideal problem where some application is doing a wackload of work in the fragment shader but it can be moved to a prior stage. The big win of whole program optimization :)
04:36 mareko: I already have that for expressions with uniform loads, UBO loads, constants, and ALUs
04:37 mareko: I only need post-dominance if there are also input loads
04:37 cwabbott: uhh, no you don't
04:38 cwabbott: whether you can move some instruction has nothing to do with whether it post-dominates an input
04:39 cwabbott: it sounds like you have some assumption you're not telling me or you're going about it wrong
04:41 mareko: cwabbott: I see, the assumption is that the number of inputs should always decrease if I move some code into the previous shader, so a path 2 input load should always lead to the same SSA node
04:42 mareko: *path from 2 input loads
04:43 mareko: in theory, a path from 3 input loads could also lead into 2 independent SSA nodes and that would be OK too
04:44 cwabbott: you could just as easily have 3 different linear combinations of 2 different inputs
04:44 cwabbott: again, post dominance has nothing to do with whether you increase or decrease the number of inputs
04:45 mareko: if there are 3 different linear combinations of 2 different inputs, that would lead to 3 replacement inputs, which is what I'm trying to prevent
04:46 cwabbott: yes, and that could happen regardless of whether those uses post-dominate the inputs
04:46 cwabbott: so, again, what do you need post-dominance for?
04:48 mareko: only 0 or 1 SSA node post-dominates a set of inputs
04:48 mareko: cwabbott: a query for: given a pair of inputs, find an SSA node that, if I were to remove that SSA node, it would also remove the inputs
04:50 mareko: it's not: find an arbitrary number of instructions that can be moved across between shaders
05:01 cwabbott: for that specific query, you can only remove both inputs if both have one use
05:02 cwabbott: so, it's simple: check if it has one use, find that use and check if the other source is also an input with one use
05:02 mareko: cwabbott: they can have multiple uses (it can be a branching graph), but eventually the graph should converge into 1 SSA node
05:04 cwabbott: that can only happen if there's a tree/graph of additions
05:06 mareko: cwabbott: for example: https://imgur.com/a/zttXND4 ; for FS, there are limits on which ALUs can be present, but other shaders don't have any limits
05:07 mareko: also flat inputs don't have any limits either
05:08 mareko: Mul2 wouldn't be movable for FS, but it would be movable for other shader types
05:10 mareko: the initial idea was to start DFS/BFS from inputs and record what's movable, but post-dominance just gives me the right answer and it's less code
05:11 cwabbott: mareko: for something that involved, I'd do something similar to what opt_preamble does and scan each instruction and store whether it's moveable
05:11 cwabbott: and no, post dominance has absolutely nothing to do with it
05:11 mareko: except that it gives the right answer
05:12 cwabbott: no it does not
05:12 mareko: why not?
05:13 cwabbott: the position of the instruction has nothing to do with whether removing it removes both inputs
05:14 mareko: cwabbott: if I remove Mul2 from that graph, it removes the inputs; do you a graph where that wouldn't work?
05:14 cwabbott: also, if everything except the input is inside an if, nothing postdominates the inputs and yet that shouldn't affect whether you can move it
05:17 mareko: cwabbott: I need to decrease the number of inputs, or at minimum, keep the number of inputs the same, I can't move everything that's movable
05:18 mareko: yes, there are cases where a post-dom tree wouldn't find movable code
05:20 cwabbott: mareko: what exactly are you claiming? that the LCA of all the inputs in the post-dominance graph is Mul2? because it's not
05:22 mareko: cwabbott: there are 3 exits, and Mul2 is the only node that all paths from input nodes must go through to reach one of the exits
05:22 mareko: yes, Mul2 post-dominates all inputs
05:22 cwabbott: no, Add1 does
05:23 cwabbott: for any schedule of those instructions
05:23 cwabbott: and ofc this all may be dependent on the way instructions are scheduled
05:24 cwabbott: but for that particular case? only Add1 does
05:24 cwabbott: *only Add1 can be the LCA of the post dominance tree
05:24 cwabbott: assuming that these are all in the same basic block, ofc
05:25 mareko: cwabbott: you're thinking in terms of the instruction index in NIR, not in terms of the graph, scheduling has no effect
05:25 cwabbott: mareko: if that's what you're doing, then don't call it post-dominance
05:29 mareko: cwabbott: "a node z is said to post-dominate a node n if all paths to the exit node of the graph starting at n must go through z" - not all paths from Input0 must go through Add1 in the graph, thus Add1 doesn't postdominate Input0
05:29 mareko: again, there are no basic blocks here, just the graph
05:32 mareko: cwabbott: ssa_def_dominates() is CFG graph dominance with an instruction index within a block, good for control flow and CSE, but it's not instruction graph dominance
05:34 mareko: ssa_def_dominates() is really scheduling-dependent instruction dominance, which is not graph dominance
05:40 mareko: post-dominance is not the ideal tool for the problem (what nir_opt_preamble does might be better), but it would work
05:46 mareko: think of srcs being the graph edges, not the linked list of instructions
05:49 cwabbott: yes, now I see what you're doing
05:50 cwabbott: dominance usually means "scheduling-dependent" dominance
05:50 cwabbott: tbh I've never heard of using dominance but on the def-use graph
05:55 mareko: if I decide to use, I might put it in nir_instr_dominance.c, or we can call it dataflow graph dominance or whatever
05:56 mareko: *use it
06:09 tomeu: daniels: thanks for the investigation!
08:59 kusma: Today is CI appreciation day for me! I tend to mostly talk about our CI when I'm annoyed with it, but today I started a Zink CI pipeline, and there's *no* "needless" jobs in my pipeline! That makes me very happy, thanks everyone who helped with that :)
09:04 ccr: all hail our CI overlords
09:05 HdkR: CI is good
09:06 ccr: CI extends life. CI expands consciousness. CI is vital to Mesa development.
09:10 DavidHeidelberg[m]: ccr: "CI extends life." - not mine :D
09:10 DavidHeidelberg[m]: agree with the rest :D
09:11 ccr: :P
09:37 dolphin: danvet: "Merge tag 'drm-misc-next-2023-04-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-next" <- I assume that means drm-misc-next is now in and we can proceed with backmerge (I see Rodrigo's drm-intel-next PR is there too)?
09:42 danvet: dolphin, yeah I pinged you before easter
09:42 danvet: or at least wanted to
09:42 danvet: maybe triple check with ci that drm-tip is all fine
09:42 dolphin: you pinged about not to backmerge yet, but if you also said it's good to go I did miss that indeed
09:43 danvet: all the -next pulls before easter should be in
09:43 danvet: dolphin, it was fairly late on Thu, I guess you were gone already
09:43 danvet: also -fixes is on -rc6 for anyone who needs that, just pushed that out
09:49 dolphin: doesn't seem any more broken than before at : https://intel-gfx-ci.01.org/tree/drm-next/combined-alt.html
09:55 danvet: dolphin, btw 2 kerneldoc warnings from -gt-next I think, see sfr mails
12:32 DavidHeidelberg[m]: we have new flake since yesterday: dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.linear_interpolation.no_offset_4_samples any idea who could be the offender (6x times since yesterday)
12:32 daniels: DavidHeidelberg[m]: which hw?
12:33 DavidHeidelberg[m]: daniels: right.. also Raven: radv-raven-vkcts:amd64
12:34 hakzsam: it's a known flake
12:35 DavidHeidelberg[m]: kk, thanks
12:35 DavidHeidelberg[m]: adding into flakes
12:36 DavidHeidelberg[m]: hakzsam: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22411
12:41 DavidHeidelberg[m]: thx
13:55 danvet: karolherbst, Lyude, javierm https://lore.kernel.org/dri-devel/20230318133952.1684907-1-trix@redhat.com/
13:55 danvet: in general for any rh patches really :-)
13:55 danvet: also I guess tom rix should get drm-misc commit rights?
13:56 karolherbst: mhh dunno
13:56 karolherbst: maybe
13:56 danvet: there's been a pile of patches and I see more all over
13:56 karolherbst: but most patches are mostly about fixing compiler warnings
13:56 danvet: yeah still you can push those
13:56 karolherbst: but yeah, I have a list I need to go through
13:56 danvet: sucking away other maintainer time isn't great even for this kind of stuff :-)
13:56 karolherbst: right
13:56 danvet: my take at least
13:57 danvet: like same for doc patches or whatever (but I'm still hoping on some good editor magically showing up for dri-devel docs)
13:57 karolherbst: heh
13:57 karolherbst: yeah, let me collect Toms patches today, I wanted to do that anyway
13:58 danvet: the amd ones need to go through agd5f but everything else should be fine for -misc
13:58 danvet: didn't yet see an i915 one
14:05 javierm: danvet: how was the struct fb_var_screeninfo .activate set to FB_ACTIVATE_KD_TEXT issue found ooi? It was by fuzzing?
14:07 danvet: javierm, I wanted to validate ->activate in the drm fb helpers and realized I couldn't
14:08 danvet: it's fundamentally control flow that the top-level callers need to get right
14:08 karolherbst: uhh.. what's the best way to fetch patches if patchwork is annoying?
14:08 danvet: lore
14:08 javierm: danvet: I see
14:08 danvet: doens't add the tags
14:09 karolherbst: lore doesn't have it either
14:09 danvet: karolherbst, https://lore.kernel.org/dri-devel/?q=f%3Atrix
14:09 danvet: then it never went to the list
14:09 danvet: most likely at least
14:09 karolherbst: ahh, there it is
14:10 javierm: karolherbst: https://patchwork.kernel.org/project/dri-devel/list/?submitter=Tom+Rix&archived=both works too
14:13 javierm: hmm, you are right that some are missing. Strange
14:14 karolherbst: yeah.. so I oversaw two patches (but there were older versions of the same thing, so I picked those)
14:15 karolherbst: most of the other are either pushed or have review pending
14:15 karolherbst: also.. I really have to CI my patchwork update script stuff
14:15 karolherbst: now the patch in pwclient I relied on is even upstreamed
14:20 dolphin: danvet: did the backmerge of drm-next to drm-intel-gt-next
14:37 tleydxdy: I'm prototyping some kernel patches, and I want to have a test to interact with drm ioctls directly (not throught libdrm), right now I'm getting EPERM even though the ioctl call is allowed on the render node, any ideas?
14:37 tleydxdy: I just put toghether the struct and call ioctl() on the fd
14:38 danvet: javierm, since thomas isn't around, did you see https://lore.kernel.org/dri-devel/4Psm6B6Lqkz1QXM@panix3.panix.com/ ?
14:38 tleydxdy: not sure if there's anything additional I need to do
14:38 danvet: sounds like it's about legacy bios fun
14:38 danvet: mlankhorst, mripard drm-misc-fixes still stuck on -rc2, maybe backmerge?
14:38 danvet: it already has more patches so ff isn't an option
14:40 danvet: mairacanal, yeah running composition tests in kunit is maybe too much work, since it'd require porting a pile of the igt infrastructure for rendering
14:40 danvet: my idea was to essentially do exactly what igt does, but in a kunit
14:41 danvet: and driving vkms directly
14:41 danvet: it has the upside that you could compare raw buffers instead of just crc
14:43 danvet: agd5f, Lyude https://lore.kernel.org/dri-devel/?q=fed2a3f9111713cc619cbd54d7c1be0987c7da7b.camel%40kernel.org
14:48 javierm: danvet: yeah, I had in my TODO but didn't dig on it yet. Let me take a look
14:53 danvet: javierm, yeah nw, just pinging things I see fly by :-)
14:55 javierm: danvet: hmm, it seems the BIOS is reporting a wrong LFB and was relying on bpp always being set to lfb_depth
14:56 javierm: VESA is the gift that keeps on giving :)
15:00 danvet: robclark, I'm kinda assume you're lockdep-testing locking changes :-P
15:00 danvet: but I've had cases where people acked/tested-by stuff without that ...
15:02 robclark: I guess I should double check.. but I _assume_ we have lockdep enabled in the CI build that gets igt run on it..
15:03 robclark: if we don't, that is an obvious oversight to fix
15:03 robclark: but yeah, I do have lockdep enabled when making locking changes ;-)
15:57 javierm: danvet: dunno what's going on, it seems the kernel and grub don't agreed on the passed video mode. I asked the reporter for more info
15:58 javierm: but can't spot anything wrong in Thomas' patch...
16:25 eric_engestrom: Mesa 23.1 branchpoint in 24h; ping me early if this is an issue for you :)
16:41 mairacanal: danvet, i don't know exactly how this could work, as kunit is more of a unit testing framework than a full-scope testing framework
16:41 mairacanal: also, it might be hard to port the whole igt rendering infrastructure to kunit
16:41 mairacanal: on the other hand, it would be nice to see unit tests for vkms formats for example, as format conversion is not tested on igt afaik
16:41 mairacanal: i believe that kunit tests go better with this type of isolated function testing
16:43 danvet: yeah ...
16:43 danvet: plus it sounds like emersion is going to type some userspace for this
16:48 emersion: do we want CRTC background, or plane background for vkms?
16:49 emersion: plane background would be a bit more useful, but may not reflect how some hw works
17:01 javierm: mairacanal: agreed
17:03 mairacanal: emersion, we were targeting CRTC background
17:04 emersion: i'd prefer to have the other one, is what i'm saying
17:37 melissawen: emersion, looking drmdb, it seems there are drivers using something as a downstream CRTC background property here (?) https://drmdb.emersion.fr/properties?object-type=3435973836
17:37 melissawen: but if plane bkg is more relevant for userspace
17:38 melissawen: I'd also prefer a plane bkg so
17:39 emersion: hm i don't think this driver is upstream
18:27 danvet: lina, https://lore.kernel.org/dri-devel/CABXGCsPZxgpFzAVN=eFXu0WV+Jk0vB4rv4p+Jt31C841LcqV=A@mail.gmail.com/
19:23 karolherbst: anybody experience with kernels warning about interrupt handler enabling interrupts?
19:24 karolherbst: a user posted an issue and it has a warning like "irq 127 handler nvkm_intr+0x0/0x240 [nouveau] enabled interrupts"
19:24 karolherbst: but somehow none of this makes much sense to me
19:26 karolherbst: but we also never call local_irq_enable
20:31 jenatali: Hm, why didn't https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22428 get a label automatically applied?
20:31 jenatali: Not a draft, not already labeled...
20:32 jenatali: And I see a rule for '^src/gallium/frontends/va/'
20:33 jenatali: Hm, is mr-label-maker working at all?
20:38 javierm: karolherbst: is weird indeed, I'm looking at https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/core/intr.c#L164 and AFAICT the function always call to the _locked variants
20:38 javierm: that never do spin_lock_irq{save,restore}
20:39 javierm: and all the function handlers executed are just writing to registers, nothing more
20:45 karolherbst: yeah.. it's weird
20:45 karolherbst: maybe some function call somewhere, but...
20:46 karolherbst: there is a lot of indirection going on
20:46 karolherbst: but yeah...
20:46 karolherbst: here is the bug report btw https://gitlab.freedesktop.org/drm/nouveau/-/issues/206
20:52 karolherbst: jenatali: yeah.. looks dead
20:53 karolherbst: daniels: MR Label Marker seems to be dead
20:56 DemiMarie: Would it be safe for Qubes OS to replace the old Mesa package in its dom0 with a newer one?
20:57 karolherbst: I doubt anybody here would know
20:58 karolherbst: 1. what is considered safe, 2. what's the version currently shipped 3. what's the version considered as a replacement
20:59 psykose: yes. ship it
20:59 karolherbst: if 1 is about safe as in backwards/forwards compatible, then yes
20:59 karolherbst: if it's about security? no idea
20:59 javierm: karolherbst: I think that followed all the call paths and couldn't find any place where IRQs would be re-enabled... I wonder what we are missing
20:59 karolherbst: maybe some funky downstream patch, no idea
21:00 karolherbst: I never seen this error in my life
21:04 javierm: karolherbst: maybe ask them if they have a kernel with debug symbols to do: ./scripts/faddr2line drivers/gpu/drm/nouveau/nouveau.ko nvkm_intr+0x0/0x240
21:04 javierm: or something like that, it might give more info that the current traces
21:04 karolherbst: well, it's just the address of the function
21:04 karolherbst: nah
21:04 javierm: karolherbst: ah
21:04 karolherbst: the kernel just prints the address of the handler
21:04 karolherbst: and does so after the call
21:05 karolherbst: I'd like to know what or where IRQs were reenabled, but I doubt there is a cheap way of figuring that out
21:05 javierm: not that useful then :(
21:05 karolherbst: could dump_stacktrace in each enablement of IRQs :D
21:05 karolherbst: but...
21:06 javierm: karolherbst: or ftrace using the graph and filter irq_enable() ?
21:16 DemiMarie: karolherbst: Safe as in it won’t pull in a huge swath of dependencies (other than LLVM) and won’t break apps
21:40 karolherbst: in this case mesa is always safe
22:00 daniels: karolherbst: give me an MR which it didn't tag?
22:09 karolherbst: daniels: e.g. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22428
22:09 karolherbst: but all the newest ones aren't taged
22:09 karolherbst: by it at least
22:12 daniels: karolherbst: yeah, seems like
22:14 daniels: {"error":null,"exit_code":0,"id":"b7f60a7a-20b8-41d1-a975-8f2b1a2d6be8","name":"process webhook","stderr":"","stdout":"https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22428 | frontends/va: disable skip_frame_enable in vaapi interface.\nunable to determine labels\n"}
22:14 daniels: so just missing a regex by the looks
22:14 daniels: (whot is better to ping for that btw)
22:31 karolherbst: mhh, jenatali said there was one already, but I didn't double check
22:54 jenatali: https://gitlab.freedesktop.org/freedesktop/mr-label-maker/-/blob/main/mr_label_maker/mesa.py#L230
22:54 jenatali: I don't see why that wouldn't apply...
23:04 daniels: I’d ping whot and check if he can debug
23:04 jenatali: whot: ^^
23:04 jenatali: Hm, looks like he's not here?
23:04 daniels: it’s now part of the webhook spam-fighting skynet
23:04 daniels: try #freedesktop
23:05 jenatali: 👍