IRC Logs of #dri-devel on irc.freenode.net for 2025-02-03

08:28 karolherbst: uhh.. we have a new infinite optimization loop in nir :'(
08:30 karolherbst: triggered inside brw_nir_optimize mhh
08:30 glehmann: at least when they are new you can bisect them
08:31 karolherbst: yeah.. already done so, but still not sure what's going on
08:32 karolherbst: nir_copy_prop, nir_opt_algebraic and nir_lower_pack involved it seems...
08:33 karolherbst: uhh..
08:33 glehmann: some nir_opt_algebraic pattern fighting against nir_lower_pack?
08:35 karolherbst: https://gist.github.com/karolherbst/5a77e78f0b9fcc0e89caa9cfeaa03ef7
08:35 karolherbst: yeah...
08:35 karolherbst: it's triggered by b1bc691b0ff44e0acfb0ede759d4d1cf7636fca2 afaik
08:38 karolherbst: I guess opt_algebraic needs to be more aware of the pack lowering options toggled
08:41 karolherbst: also doesn't help that we have skip_lower_packing_ops and all those lower_pack_* options
08:46 karolherbst: I _think_ intel needs to get `.lower_pack_64_4x16 = true,` added
08:47 karolherbst: well.. will send an MR
08:47 karolherbst: but anyway, this packing situation ain't great and if somebody feels motivated enough to streamline things a bit, that would be awesome. Not sure I'm in the mood to dig into all the compilers this week
08:57 karolherbst: anywaya, MR opened
09:03 glehmann: nir_lower_pack is also partially redundant with nir_lower_alu_width
09:12 jlawryno: Hi, drm-tip has a merge conflit with drm-intel/topic/core-for-CI
09:12 jlawryno: anyone can help with this?
10:06 sima: tzimmermann, https://lore.kernel.org/dri-devel/cover.1738347308.git.lorenzo.stoakes@oracle.com/ I think with this we could decouple defio from struct page entirely and avoid the bounce buffer for dma allocations
10:06 sima: somehow missed this the first time around, and not entirely sure because monday morning
10:07 sima: but feels like it's the major piece we've been missing
10:13 tzimmermann: sima, thanks for the pointer. i can't say i understand muh of the MM logic. but i'll take a look at the series. the cover letter makes it seem urgent
10:32 sima: tzimmermann, it's more long-term conversion towards folio/memdesc that the mm folks aim for
10:34 sima: but a side effect is that I think defio doesn't need struct page anymore to talk to core mm, so if we remove any remnants we have (like dirty bits I think we track in there still, so that'd need to be a bitmap and then I think we're done and could get rid of the phys_to_page from all defio code
10:35 sima: which means we don't need the bounce buffer for dma memory anymore, in case that's not struct page backed
10:35 sima: and so could undo your regression fix again and go for the nice unified world you had in mind
11:12 wens: bbrezillon: I see in panthor_resume() atomic_read(), check, then atomic_set() if check passes; I wonder if it should be atomic_cmpxchg() instead?
11:41 tzimmermann: sima, that sounds good. i was hoping for something like this to happen
12:29 bbrezillon: wens: I guess we could turn that into an atomic_cmpxchg(), yes
13:28 zmike: anyone else have issues building piglit with recent python3?
13:42 bl4ckb0ne: is there somebody well-versed with VK and android AHB around? I'm getting `Gralloc4: non-BLOB pixel format with GPU_DATA_BUFFER usage is not supported prior to gralloc 4.1` and im pondering using BLOB pixel formats everywhere
14:05 Lynne: can you not compile nouveau with nightly rustc?
14:05 Lynne: ""1.86.0-nightly" is not a valid Rust target, the patch version number must be an unsigned 64-bit integer"
14:56 alyssa: nir_foreach_block blowing up in a null dereference in nir_cf_node_next because the exec_node_get_next is returning null .. but nir validation is clean .... this is scary (:
15:11 alyssa: ughh. I think this is another case of _safe not actually being safe
15:12 alyssa: *cry*
15:12 zmike: I've had that happen many times
15:14 alyssa: it is entirely too Monday for this
15:20 alyssa: or rather, failng to use _safe
15:20 alyssa: bah
16:10 markco: I was working on implementing the NIR side of block matching from https://registry.khronos.org/vulkan/specs/latest/man/html/VK_QCOM_image_processing.html in Turnip and the SPIR-V instructions for them take two sampled images and coordinates which doesn't really work with `nir_tex_instr` only being designed for one instance of any `nir_tex_src`.
16:15 glehmann: you can either add a new nir_tex_src for the second image, or you can use an intrinsic instead of a tex instr
16:15 markco: There's three solutions to this:
16:15 markco: * Make this an intrinsic instead and push most of the handling to the next layer (i.e. IR3 for Turnip), but there's still useful behavior within NIR for texops such as checking if handles are dynamically uniform for the divergence pass and that would need to be duplicated for the intrisic.
16:15 markco: * Add `2` variants (eg. `nir_tex_src_texture_deref2`) of the existing types.
16:15 markco: * Support multiple sources per type (perhaps only for this texop).
16:20 karolherbst: given the second image has semantics attached to it, naming the second one `deref2` might not be the best idea
16:20 karolherbst: sounds like it's more of a "reference" thing? so maybe call it `src_reference_deref` or so would be better
16:21 markco: glehmann: Yeah, I'm just thinking what the best option might be here and wanted opinions on it
16:22 karolherbst: the confusing part in the spirv ext is, why do you need to decorate images like this?
16:23 karolherbst: anyway.. I think if you add a second image src the name should be more explicit than 2
16:25 markco: karolherbst: Essentially, the operation has a target and reference sampled images (+ coordinates) that the operation is run on. As far as tex srcs go, it should be `nir_tex_src_texture_deref`, `nir_tex_src_sampler_deref` and `nir_tex_src_coord` for each image.
16:26 markco: As for decorating it, I think that's more or less for validation purposes since this has a unique descriptor type (`VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM`)
16:26 karolherbst: I see
16:27 karolherbst: _maybe_ it makes sense to special case things here a little
16:28 markco: In what way, extra src type or support for multiple instances of a single src type?
16:28 karolherbst: probably the latter, given you not only have to add a texture reference, but a few more things
16:28 glehmann: markco: as long as you don't choose your third idea, I personally don't care too much. maybe ask the IR3 devs in #freedreno what they prefer
16:29 karolherbst: but I think it shouldn't just be "allow two of the same"
16:29 karolherbst: if you go that route
16:29 glehmann: multiple instances of a single src type is just weird and allowing it all common passes have to care about it to some degree
16:30 karolherbst: yeah... it's gonna be quite a lot of stuff to rework
16:30 karolherbst: but I think a lot of passes will need to care about it anyway if that's getting added
16:30 karolherbst: either by duplicating the deref -> index and coordinate lowering or whatever we have there, or by having some sort of loop
16:32 karolherbst: mhhhh
16:32 karolherbst: we could add an union to `nir_tex_instr`
16:32 karolherbst: it already contains a few op specific things
16:32 karolherbst: e.g. `tg4_offsets`
16:32 karolherbst: or is_gather_implicit_lod
16:32 karolherbst: and component apparently.. all tg4 stuff ...
16:33 glehmann: is_gather_implicit_lod is kind of gross and I would prefer to remove it at some point
16:33 markco: glehmann: I already talked to jnoorman (IR3 dev) and so far, and he was for the third idea so far but he has limited experience with the texop side of things
16:34 karolherbst: I think the issue here is, that it's going to touch a bit of code regardless of the solution
16:35 karolherbst: the issue with coordinate is, that there are tex operations which don't take one, so that's also a bit of fun...
16:35 markco: Yeah, the only solution which (probably) touches the least amount of NIR code would be making it an intrinsic and duplicating a lot of the texop logic for it
16:35 karolherbst: though... maybe splitting up the `src` field wouldn't be the worst idea...
16:35 glehmann: markco: the third idea is fundamentally incompatible with nir_tex_instr_src_index
16:36 karolherbst: I think we should go for a mix of 2 and 3
16:36 markco: Yeah, and it would need to be modified for it
16:36 karolherbst: question is just, what should the result look like
16:36 alyssa: i'm inclined to nak #3 for the reason glehmann says
16:37 markco: karolherbst: So a src2 of sorts with `nir_tex_instr_src2_index`?
16:37 karolherbst: I think we should have two arrays of sources, one for the "target" and the other for the "reference"
16:37 alyssa: please no.
16:37 karolherbst: every solution will suck, I can already see 2 being ugly enough
16:38 karolherbst: basically means duplicating a lot of stuff and checking "is this 1 or 2?
16:38 markco: Yep, this is really about which one sucks the least so that it's acceptable upstream..
16:38 alyssa: yes, which is why glehmann and I are asking you to pick a solution that sucks for ir3 instead of a solution that sucks for everyone including ir3
16:39 karolherbst: do we anticipate others wanting something similar long-term?
16:39 jenatali: Going from 2 derefs to 3 doesn't seem like a big deal to me?
16:40 jenatali: There's already texture and sampler. Adding a reference or whatever it's called seems fine
16:40 alyssa: I think that's my preference yeah
16:40 karolherbst: the second pair can also be texture+sampler thing
16:40 glehmann: tex source types are cheap, so I would just add nir_tex_src_ref_coord and similar for all the things you need
16:40 alyssa: new src types for reference_image_deref or whatever
16:40 alyssa: ^ yeah
16:41 karolherbst: yeah.. we can go for that for now, I just think the end result will also look bad enough that we consider doing it a bit differently
16:41 karolherbst: but ...
16:41 karolherbst: it would touch the same parts of the code
16:41 markco: Right, that theoretically works. I can handle most of the questionable bits within IR3.
16:41 alyssa: +1
16:42 karolherbst: I'm sure it's also fair to add a bit of lowering to nir passes if that's not too ugly
16:43 glehmann: what does this look like in hw? does adreno have actual instructions that take multiple descriptors?
16:43 markco: Yep, it does
16:44 markco: It's more or less identical to the SPIR-V instruction
16:45 glehmann: that's quite a lot of fixed function for 2025 :D
16:46 karolherbst: well it's just two additional registers I guess
16:46 karolherbst: for an op not really taking any else
16:46 alyssa: jenatali: could I get some help with https://gitlab.freedesktop.org/mesa/mesa/-/jobs/70363354 ?
16:46 jenatali: Unrelated question: Our video folks are wanting to get a build environment that's gallium without nir for (e.g.) libva. Looks like that needs a meson option to turn off adding libnir to the build. Would folks be amenable to that?
16:46 alyssa: I really don't know why that assertion would fail but only on windows/msvc
16:46 jenatali: alyssa: Looking
16:46 alyssa: thanks
16:47 jenatali: I've got a hunch...
16:47 karolherbst: jenatali: va doesn't seem to use any nir code, right?
16:48 jenatali: karolherbst: Right
16:48 jenatali: But right now just building gallium pulls in nir, and requires an actual definition of (at least) nir_print_shader
16:48 alyssa: it seems.. questionable but I don't care (:
16:48 karolherbst: well.. cleaning up the linking aspect of the build system isn't necessarily a bad idea :D
16:48 karolherbst: I've done a bit of that last year or so
16:48 jenatali: There's stuff in gallium/aux that assumes it's there
16:48 karolherbst: rely on the linker to be smart enough?
16:49 karolherbst: drivers will pull in nir already anyway, so I don't think a frontend which doesn't use nir should do it
16:49 alyssa: jenatali: umm
16:49 alyssa: vl_compositor
16:49 alyssa: va depends on nir
16:49 jenatali: This is a little less about binary size and more about build times... and also the fact that we've got security policies that are applying static analysis requirements to everything we build :(
16:49 jenatali: alyssa: We've got a currently downstream frontend that doesn't use that but also plugs into the gallium interface
16:50 karolherbst: I mean..... getting static analysis on nir wouldn't be the worst idea
16:50 karolherbst: probably
16:50 jenatali: Heh, it requires a bunch of build warnings enabled, some of which are... really dumb
16:50 karolherbst: I think my take is that it depends on the patches and how it looks like
16:50 alyssa: you know what. i am too exhausted to care. go wild y'all
16:50 karolherbst: jenatali: as in "a complete disaster to fix"?
16:50 jenatali: :P
16:50 alyssa: a-b's for everyone, yippee!
16:51 jenatali: As in there's nothing wrong with the code, so the "fix" is just applying dumb changes that make the code harder to read to silence dumb warnings
16:52 karolherbst: mhhhh
16:52 karolherbst: if I'd have to take a guess, around VLAs?
16:52 jenatali: That's one of 'em
16:52 karolherbst: does this "this field is the length of this VLA" feature would fix some of that?
16:53 karolherbst: because I do kinda think we should use those new attributes anywya
16:53 jenatali: In C++ "flexible array entries" are technically undefined behavior and MSVC has a warning that says that using them is a vendor extension
16:53 karolherbst: mhhhh
16:53 jenatali: So just including nir.h triggers that warning since it uses those for things like nir_src for intrinsics
16:54 karolherbst:makes head scratching noises
16:54 jenatali: alyssa: Does bindgen write files opened in binary mode or text mode?
16:54 jenatali: I'm failing to quickly find where the source for that ended up
16:55 jenatali: Ah yeah there it is, fopen(outhfile, "w")
16:55 jenatali: Make that "wb" and you should be good
16:55 alyssa: jenatali: ...Okay
16:55 karolherbst: jenatali: I think I want to see the patches and how terrible this all will be :D
16:56 alyssa: thanks, I think heh
16:56 jenatali: I saw this with spirv-as.exe in the past, text mode messes with binary code to convert line endings
16:56 alyssa: amazing
16:56 jenatali: karolherbst: Yeah it's not *too* bad
16:57 karolherbst: different line endings were a mistake
16:57 karolherbst: can we just.. stop? :D
16:58 jenatali: I wish :(
16:58 alyssa: jenatali: fixes in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33357 , thanks!
16:59 jenatali: Oh I had the wrong fopen, whoops :)
16:59 alyssa: eh, close enough
17:00 alyssa: :P
17:32 radiosumnrnone: the procedure kind of works, but requires a division by 4 the least so far (though all members of a bank yield deterministic), in other words the delta is only going increase to the point i no longer can handle it very easy, and 24/5 is 3 same as 12/4 is 3, you know i can not seem to land the hack, it's over my budget mission, but the filesystem is likely possible. so 141+3 is 144/2 is
17:32 radiosumnrnone: 72 is the only last red indian standing , so the dependency chain i posted, is only working algorithm so far for data compression, but needs larger base address calculated, but the vaue shifts too much away, i did not find a better way. So technology world is somewhat horribly difficult and such things i seem not be capable of pushing through from. Dog makes sure i could not connect no
17:32 radiosumnrnone: one in this world, he is some nutbolt trashbag.
17:59 DemiMarie: jenatali: Consider yourself lucky you are not using GLib. That will happily call functions with too few arguments, too many arguments, or with arguments with wrong pointer types. Oh, and it expects that to work.
18:05 jenatali: Ouch
18:05 HdkR: variadic the world!
18:05 HdkR: :D
18:10 ricomandale: it's 24/8=3 of course, anyhow the procedure needs more work, but that is the start of longer journey, as i did get 4 cells all to work so.
18:11 ricomandale: i am getting awesomly tired, and it shows how this point of interest delta is shifting away.
18:32 alyssa: aaaand the ppc64 and s390x builds failed..... cool..
18:33 alyssa: these builds passed a few days
18:34 alyssa: something llvm 15->19 related..
18:35 glehmann: does s390x still have users? (not saying we should remove support for it, just curious)
18:36 alyssa: apparently yes
18:38 alyssa: it wasn't the LLVM uprev, since these jobs passed on Friday and that was already merged last week
18:40 alyssa: oh.. meson.build change in 82047fa82f1 ("amd: drop support for LLVM 15, 16, 17")
18:40 alyssa: umm.
18:40 alyssa: ok.
18:41 alyssa: even worse, s390x and ppc64le are failing in different ways?!
18:41 alyssa: s390x is missing spirv-tools but ppc64el is getting the wrong version of llvmspirvlib
18:42 karolherbst: ... "fun"
18:42 alyssa: karolherbst: did you land something to break this
18:43 alyssa: seemingly no
18:43 karolherbst: I don't think so.. well I bumped the spirv-tools version, but I don't see why that would break things this way
18:43 alyssa: right..
18:43 alyssa: meson.build:1883:21: ERROR: Dependency lookup for LLVMSPIRVLib with method 'pkgconfig' failed: Invalid version, need 'LLVMSPIRVLib' ['< 15.1'] found '19.1.0.0'.
18:43 alyssa: this raises so many questions I don't know how to answer
18:43 karolherbst: mhhhhhhh
18:43 karolherbst: maybe a hardcoded 15 somewhere?
18:44 karolherbst: ".gitlab-ci/container/build-libclc.sh:LLVM_TAG="llvmorg-15.0.7"" doesn't look great tbh
18:44 alyssa: yeah, idk why s390x and ppc64le is special though
18:45 karolherbst: `&debian-ppc64el-llvm 15`
18:45 karolherbst: `debian/ppc64el_build`
18:45 karolherbst: `LLVM_VERSION: &debian-ppc64el-llvm 15 # no LLVM packages for PPC`
18:45 karolherbst: I guess that would do it
18:46 alyssa: so that explains half of this, I guess
18:46 alyssa: maybe my Friday pipeline wasn't rebased enough to get the llvm bump
18:52 alyssa: Unfortunately this is where my understanding of CI ends
18:52 alyssa: so I think I have to tap out now. alas
19:01 DemiMarie: glehmann: I susoecto
19:01 DemiMarie: Ooops
19:01 DemiMarie: Typo
19:02 DemiMarie: I doubt any of the accelerated drivers are used on mainframes.
19:10 Lynne: airlied: I'm seeing issues with av1 decoding on radv
19:11 Lynne: something looks like it goes wrong with the refs
19:53 linyaa: qq. is anyone working on new uapi that requests a drm commit at a user-provided target present time?
19:54 linyaa: context: i'm investigating the progress of VK_EXT_present_timing.
19:54 linyaa: and Android hwcomposer apis.
20:04 emersion: linyaa: iirc we figured the compositor should be able to make it work without KMS additions
20:35 zamundaaa[m]: emersion: with VRR, scheduling with the CPU is unfortunately not good enough. For that, we do want some API
20:36 zamundaaa[m]: For FRR it should indeed not be needed though
20:59 linyaa: i agree with zamundaaa
21:22 emersion: ah yeah
21:27 sima: linyaa, emersion I think we had some endless bikeshed on that vk discussion but I guess adding a target frame timestamp prop for atomic flips is probably the way to go
21:27 sima: there's a bit of pain on the backend since currently scheduled flips for FRR aren't even exposed through atomic either
21:27 emersion: I agree
21:28 emersion: with "not before" semantics probably
21:28 sima: so might be a bit a mess from an uapi pov if you can get target ts through atomic but target frame through legacy only
21:28 sima: but then probably not the worst we've done
21:36 linyaa: another uapi piece that VK_EXT_present_timing wants is a feedback timestamp that indicates either (a) when the display controller began scanning out of the framebuffer memory or (b) when the first pixel becomes visible. (i assume that [a] is easier on most hw).
21:36 emersion: we already have that
21:36 linyaa: oh good
21:37 linyaa: what is the api bit that i can grep for?
21:50 emersion: struct drm_event_vblank
21:52 sima: yeah vblank events should be really precise and correct for first pixel going out through the connector, as much as the hw can do
21:52 sima: so not taking sink latency into account if it's external
21:53 sima: *corrected for first pixel
21:53 emersion: yup
21:53 sima: unfortunately we don't have uapi that tells you how much it's been corrected or whether it's just a timestamp grabbed in the irq handler :-/
21:53 sima: but would be fairly easy to add that somewhere
21:56 sima: could even expose the resolution of the clock we use for correcting (iirc it's either pixel clock, line counter or some hw timestamp with a sub-usec resolution)
21:57 jannau: the apple firmware based display controller very likely needs frame timestamps for VRR. don't have to be user space generated but it would of course easier
21:57 sima: jannau, currently VRR is "asap"
21:58 sima: but that has really bad amounts of jitter since you go through an entire fairly big syscall and async work to get there
21:58 sima: maybe even more depending upon hw
22:04 jannau: "asap" is unfortunately not quite asap with apple's FW (at least with the way we currently use). if my understanding is correct a swap takes 1-2 ms
22:09 sima: yeah with fw there's probably a bit more latency on top that all adds up
22:22 zamundaaa[m]: Knowing that latency would also be very important for userspace, with or without a scheduling API
22:33 linyaa: For precise timing info (requests and feedback), I think it's also important to specify which clock timeline these timestamps occur on. IMO, precise timing info is more important than trying to shove everything onto the same CLOCK_MONOTONIC timeline.
22:35 linyaa: VK_EXT_present_timing supports having different timing events occuring on different timelines. For example, CLOCK_MONOTONIC vs the display controller's (perhaps different) clock timeline. VK has functions to map time values from different clocks onto a unified timeline (such as CLOCK_MONOTONIC).
22:36 linyaa: We don't want the upstream KMS api to make VR users vomit, yknow :-) Precision prevents vomit.
22:37 linyaa: (disclaimer. i'm a vulkan person, not a kms person. i may say uninformed things regarding kms).