00:00anholt_: handling non-ssa src regs deep in the tree feels tough. I think you'd need to keep a HT of live instrs you could source from that you're handing to the search?
00:00jekstrand: I think non-SSA is probably doable if you're careful but you'd better have good copy-prop because it'll emit a LOT of MOVs.
00:00anholt_: yeah, source mods are a definite no imo
00:00jekstrand: Basically, I think you'd need to have a way to retroactivally emit a MOV
00:00jekstrand: Either that or only match non-SSA sources on the root instruction
00:01jekstrand: The later would probably be doable
00:01anholt_: was thinking maybe the answer was "move regs out of ALU ops other than mov, do the instruction selection, then try to copy prop movs.
00:01anholt_: which incidentally feels like a step toward regs-out-of-NIR-src/dst.
00:01jekstrand: Another option is that cwabbott and I have talked about dropping registers from nir_src all together and instead having load/store_reg instrinsics.
00:02jekstrand: Yeah, I'm kind-of inclined towards the dropping registers from NIR solution.
00:02jekstrand: Well, dropping them from nir_src/dest
00:02anholt_: fd's already relying on backend "copy prop" (it's a pass with multiple jobs) for even things like dst regs.
00:02jekstrand: The big downside there is that it means your back-end copy-prop has to be really good
00:02anholt_: v3d has mad hacks for handling dst regs everywhere that would be easier as a store reg instrinsic.
00:03jekstrand: I think in vec4 we genuinely get some value out of nir_alu_instr being able to write directly to a reg
00:03jekstrand: Mostly for write-masking
00:04jekstrand: Though we could probably still handle that in the back-end somehow, especially since the store_reg would come immediately after the instruction 90% of the time. That'd make it pretty easy to detect properly.
00:05jekstrand: I'm just generally unconvinced of the utility of advanced pattern matching for NIR -> back-end translation.
00:05jekstrand: I've seen very few cases on our hardware where that's actually the right place to put a transformation
00:05jekstrand: It's always either "that can go in NIR" or "if we had competent back-end opt_algebraic, we could handle that"
00:07anholt_: jekstrand: comparisons are somewhere that often wants to have multiple NIR instrs consumed at once.
00:08anholt_: I think I needed to have ntq_emit_comparison() chain multiple _pf updates to handle ands and ors.
00:09anholt_: but with stuff like that, you also want to reuse a partial computation of it that you may have had to generate for some other subexpression use, and thus NOLTIS.
00:10anholt_: another is "we can only reference the uniforms register file in one src arg of an instr, and you want to decide which uniform turns into a MOV" (fd and v3d both have this)
00:18jekstrand: anholt_: Yes, we have some similar issues with our flag reg
00:18jekstrand: And *maybe* NOLTIS can solve it but that assumes it understands flag pressure which it doesn't
00:19anholt_: v3d's flag usage is all within an instruction. so when you use a bool, you go reaching back to see if you can find a comparison to fold in.
00:20anholt_: (within emitting of a NIR instruction, that is)
00:20jekstrand: That makes sense
00:21jekstrand: I had code for brw_fs_nir.cpp to re-emit comparisons on-the-spot in the hopes that CSE could fix it up
00:21jekstrand: It was kind-of a wash IIRC
00:21jekstrand: It's been a while
00:22jekstrand: Part of the problem is that re-emitting a comparison means two live registers instead of one.
00:22jekstrand: I'm hoping that IBC can do better since it actually RAs the flag
00:23anholt_: for v3d, if I can't fold the comparison it's compare, mov, (pred)mov, so moving the compare down to avoid the sel movs is a pretty big deal.
00:23anholt_: (that is, flt is compare, mov, predmov, then bcsel is another compare, mov, predmov again.)
00:24anholt_:shakes fist at the flags register access producing 0x00010001 for true
00:24jekstrand: Makes sense
00:24jekstrand: For us, it's less obvious
00:24jekstrand: And with the CSEL instruction, we can actually consume the 0/~0 boolean directly
00:25anholt_: freedreno compares are so much nicer. just stick 1 or 0 in a reg, thanks.
00:25jekstrand: See also https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4617
00:26jekstrand: There's a fun trick in there involving NaN. :D
00:33anholt_: why using REGISTER_TYPE_F for your csel args?
00:33anholt_: oh, wait, it is F only
00:34jekstrand: It starts supporting HF, D, and W on Gen11
00:34jekstrand: But we likely want to keep it F there because SEL with float source modifiers is surprisingly common
00:35anholt_: adreno has float, int, and bitwise SEL, which controls the type of all 3 src args (and thus what negate means on them, and what you're comparing against)
00:36anholt_: so we'd actually like to go matching and say, oh look there's an fneg in my srcs, let's do the SEL_F
00:45jekstrand: Yeah, yet another reason why NIR source modifiers are a bit garbage
00:45jekstrand: I'd rather match on "fneg" than "negate with the type of the SEL"
00:45jekstrand: The way we handle this in our back-end is that the copy-prop pass which propagates modifiers into sources is able to flip the type of the SEL from int to float if needed.
00:47jekstrand: I've got to think a bit more about how to do CSEL properly. I suspect we'll want a BCSEL opcode which has the semantics of "I promise the selector is 0/~0" so that we can flip the types of all three sources safely.
00:48bl4ckb0ne: how does it work for merging a pull request on mesa that depends on a pull request on piglit?
00:49imirkin: karolherbst: make sure you don't enable the wound_wait lock debugging stuff
05:47sravn: airlied: did we miss a drm-misc-fixes pull for -rc2?
09:23MrCooper: bl4ckb0ne: merge the piglit MR first, then update piglit in the docker image in the Mesa MR
09:23MrCooper: dcbaker: thanks! I'll test it on Monday at the latest
10:55Putti: I'm trying to still get the lima mesa driver loading with X11 but now I think it only setups the card0 node since screen just blanks: http://paste.debian.net/plain/1141118
10:56Putti: The lima render node is the renderD129 node
10:56Putti: it seems to try do some PCIE stuff which is wrong
10:57Putti: it should probably try to load the preferred driver (lima) with DRI_PRIME=1 env variable but that is not happening for some reason
11:02Putti: I have been trying to find some instructions on what environment variables or X11 settings I need for this but without luck, so if anybody knows any articles that would be helpful!
11:47Putti: I modified the exynos drm driver now to not show up as render node so now lima node should be selected but still blank screen
11:49Putti: I wonder if VGA arbiter X11 module is required for this to work (I don't have it installed)
14:00Putti: Now I got the EGL debug messages shown: http://paste.debian.net/plain/1141141 So something about EGL_BAD_MATCH and EGL_NOT_INITIALIZED