01:37mareko: should vote/ballot/elect instructions be intrinsics or ALU instructions? what about scan/reduce?
01:41mareko: Venemo: do you remember who mentioned splitting the divergent flag into subgroup-divergent and quad-divergent or even uniform (convergent across workgroups)?
01:41mareko: *subgroups
01:53zmike: I vote intrinsics so everyone doesn't have to change their compilers
01:57Lynne: same algorithm, but 100x faster prefix sum algorithm, yet 20x slower than opencl... it's a start
01:58Lynne: the memory barriers between each dispatch really kill performance
02:13DemiMarie: ?
02:13DemiMarie: I take it that GPUs despise barriers.
02:14Lynne: they do, but this is far worse than I expected
02:16jenatali: +1 to intrinsics. Is there a reason to want it to be ALU?
02:19alyssa: aren't they already intrinsics?
02:20alyssa: jenatali: nominally constant folding and consistency with fddx
02:21jenatali: I guess you could constant fold specific constants, but that seems like it belongs in nir_opt_intrinsics instead
02:21jenatali: And yeah honestly fddx seems a bit more like an intrinsic than an alu to me 🤷
02:26alyssa: valid
02:27alyssa: it's definitely a funny one
02:27alyssa: The constant folding for fddx is kinda funny though
02:28alyssa: jenatali: FWIW hardware vendors don't agree on what fddx is, either
02:28alyssa: On AGX, it's regular floating point ALU.
02:28alyssa: On Midgard, it's a texture instruction.
02:28jenatali: You weren't kidding on constant folding for fddx being funny
02:28jenatali: All I know is WARP, where it's also a texture instruction
02:29alyssa: On Valhall, it's lowered to cross-lane permutes (I forgot the API name for that) and ALU
02:29jenatali: Wait no I'm thinking of calculateLod, it's just an ALU there
02:29alyssa: where the permute is on the special function unit along with eg frcp
02:30alyssa: jenatali: "What's the derivative of this constant input?" "0" :-D
02:30jenatali: Which lanes have true for this constant true value? :P
02:31alyssa: :D
02:31jenatali: Trick question! Which lanes are active ;)
02:31alyssa: TBH, the distinction between ALU and intrinsic has always been entirely artificial
02:31alyssa: but, meh, it seems like a useful kind of artificial
02:31jenatali: Yeah
02:32alyssa: Midgard has a firm {ALU, load/store, texture} split to its instructions
02:33alyssa: for goofy VLIW reasons
02:33alyssa: and even that doesn't map completely to NIR
02:34jenatali: Btw, anybody wanna review a Windows WSI / Vulkan runtime change? The only name I know to ping is gfxstrand and I feel like that's not fair :P
02:34alyssa: oh right yes
02:34alyssa: barriers were also texture instructions
02:34alyssa: for.. Reasons
02:34alyssa: :D
02:35jenatali: ... wat
02:35alyssa: jenatali: I mean, the texture pipe already had the logic needed to do barriers because texture2D() implicitly does barriers to calculate derivatives
02:36jenatali: Huh
02:36alyssa: and when you're trying to implement OpenCL on your GLES2 GPU, oh look we already have barriers nice let's just tweak that
02:36alyssa: :p
02:36alyssa: (barrier for a workgroup vs a quad, pff details)
02:39jenatali: I should go enable subgroups for CL and GL now that I added them for VK
03:02Lynne: the correct stages for a compute->compute memory barrier are compute+compute, right?
03:02Lynne: why is using top/bottom of pipe instead basically a noop, but a compute/compute kills performance?
03:03Lynne: in theory top/bottom should be slower as it afaik immediately inserts a barrier, of which if there are multiple, would be required to be done in-order
03:17Lynne: nvm, forgot the order is src=bottom, dst=top, same performance pretty much as compute+compute
03:21mareko: jenatali: algebraic with vote/ballot/elect might be interesting
03:28mareko: on AMD, vote is ALU, quad_swizzle is ALU, fddx is quad_swizzle + fsub
03:30mareko: it's really about how do we make it easy to write opt_algebraic-style pattern matching for intrinsics
05:35danvet: mlankhorst, just pushed the deadline fix to drm-misc-next, if you can do a pr still today would be great to propagate the fix (there's two in total)
06:16dj-death: Lynne: can try
06:16dj-death: Lynne: need to include https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22302 which is going to make that a bit painful
06:57mareko: idr: FYI https://gitlab.freedesktop.org/mesa/mesa/-/issues/7161
07:12Venemo: mareko: I mentioned changing divergence analysis to know the difference between wave-uniform and workgroup-uniform, I hadn't thought about quads yet
07:24danvet: tzimmermann, ok if I just apply your patch on top of my series and send it all out together?
07:25danvet: with the other polish comments from javierm addressed I mean
07:25tzimmermann: danvet, i can.
07:25tzimmermann: give be a bit
07:25danvet: oh that works for me too, I was just about to start rebasing in the polish
07:26danvet: I'll go look more at lina's rfc then
07:26danvet: well need to sort out laundry first
07:36lina: I still need to do a bunch of rebasing and other stuff for the Rust dependencies so I'll reply piecewise if that's okay ^^
07:37lina: danvet: Also is your Mail-Followup-To intentional? That really confused me (see emails, I only realized what it was after that)
07:37lina: I should sort out laundry too...
07:39danvet: lina, is something wrong with the follow-up-to?
07:39danvet: it's just mutt, I have no idea how any of this works
07:40danvet: lina, also no worries on being slow with replies, I've been way too slow imo too so that's mostly all on me :-/
07:40danvet: I think I have some minor questions on the drm_mm wrappers
07:40danvet: and then a big can of worms I have no clear idea on for the core drm_device abstraction and how the traits should work
07:41lina: danvet: It has everyone but you, so I was really confused when my thunderbird moved all the CCs to To (since there was no To left) and you weren't on the list any more.
07:43lina: For the device removal stuff we'll have to talk to the other Rust folks since that whole problem (hardware removal vs device struct vs driver/UAPI data) is something intended to be solved at the top layer already, and I have to admit I'm not 100% clear on the rules myself. Plus that whole thing is changing anyway on the base kernel abstraction...
07:43lina: Also my driver is kind of a terrible example for this because you can't hot-remove the GPU and even if you tried, you can't hot-add it again (there is no way to reset the firmware to startup state) so we can't test reconnect cycles...
07:46danvet: lina, yeah the device lifetime fun is I think something that needs serious work
07:46danvet: and imo really not a blocker for asahi
07:46danvet: maybe need to poke boqun et all whether we should have a separate session for that
07:47danvet: lina, yeah I know ... I still think hotremoval (i.e. unbind driver in sysfs while userspace is running) is a good thing to validate whether your lifetime rules are solid or not
07:47danvet: lots of fun stuff starts to happen
07:47lina: The idea on the generic Rust device side is that device data is split into driver data itself, and Resources, and the Resources are accessed via a revocable container, so when the device goes away all those accesses fail (and there's some locking/RCU/something magic to make sure those Resources don't go away while they are being used, though of course on real hardware they'd break on hot-remove anyway, it
07:47danvet: and it's as close to real hotunplug as you can get (it's how the hotunplug igts work too)
07:47lina: just means the actual structures exist)
07:48danvet: yeah there's always a race
07:48lina: Yeah, I mean PCI is going to start returning 0xffffffff on reads or whatever on hot unplug, no way around that.
07:48danvet: as long as the revoke happens it's good, since after ->remove or whatever the resources could be reassinged to a different device
07:48danvet: e.g. thundrebolt iommu bars
07:49lina: We should be able to test the unbind case on my driver but only once per reboot, so it's good for smoke testing but kind of painful to do repeatedly.
07:49lina: Yeah
07:49danvet: lina, oh the fw dies if you unbind?
07:49lina: It doesn't, but there's no way to reinit it...
07:50lina: The init operation is one-time as far as I can tell, so you'd have to somehow carry over piles and piles of in-memory data structures
07:50danvet: oh so you'd need to keep all the channels/queues around?
07:50lina: Yeah, the whole of initdata...
07:50danvet: how does that work with vm? or the fw has some magic knowledge to reset at boot and only there?
07:50lina: VM?
07:50danvet: well I was thinking device pass-through
07:51danvet: I guess that also doesn't work?
07:51lina: Ah, it doesn't, other than m1n1 of course (but m1n1 is special)
07:51lina: Actually true device pass-through is impossible anyway, there's no real IOMMU for that
07:51danvet: ah yeah I guess then your driver really isn't a good vehicle to really test all the device lifetime stuff :-/
07:51lina: Yeah...
07:52lina: At least we can do something like spin up a dozen glmarks and pull the plug and make sure nothing blows up.
07:52lina: But that's about it...
07:52danvet: so I guess test it a bit, cover anything that's amiss in the todo, and then there's going to be big discussion for the long-term plan/design
07:52danvet: yup
07:52lina: Yeah
07:52danvet: I mean the C driver side isn't really good either, we're barely at the "in theory it's possible to do it right" stage
07:52lina: This is why I'm always so paranoid about firmware crashes (and why Rust was a good idea to help with that too), if that happens users have to reboot...
07:53lina: I fixed another one of those yesterday, I think I'm slowly running out of them.
07:53danvet: yeah userspace really doesn't cope well with a terminally dead gpu
07:53lina: Thankfully this stuff is rare enough and mostly seems to be corner cases around GPU faults anyway, which happens a lot more often with known-broken tests than things people are running.
07:53danvet: usually the compositor just keels over and the entire session dies
07:53lina: I think only one or two people (besides Alyssa and I) have seen firmware crashes so far
07:54lina: At least I made the crash handling nice now, it used to just hang everything. Now it kills all in-progress jobs and ENODEVs everything subsequent.
07:54lina: So hopefully the compositor dies and you at least get a TTY to dump dmesg or whatever
07:56jannau: it might even restart with software rendering
07:56danvet: yeah that's the best you can do with what you have
07:56danvet: as long as the display keeps displaying
07:56danvet: ideally compositors would recreate their render ctx on llvmpipe and no die
07:56lina: That's a good point, I need to make sure the get params stuff fails with ENODEV too, right now I think we only fail other stuff.
07:57danvet: but I'm not sure whether that'll ever happen
07:57lina: That way mesa init will outright fail on that driver and skip it.
07:57danvet: emersion, ^^ ?
07:57lina: (We also don't have the robustness stuff hooked up in mesa properly yet I think)
07:57danvet: lina, so I think the super-clean solution would be to hotunplug the entire thing
07:57danvet: unload the driver
07:57danvet: then mesa should dtrt cleanly
07:57jannau: as long as the framebuffers remain valid while in use by the display the display should be unaffected
07:58lina: That's a good question, what happens to the GEM objects if you hotunplug?
07:58danvet: or maybe not unbind the driver, because you might need that for introspection stuff
07:58lina: The display controller is going to have those imported
07:58danvet: but drm_dev_unplug() or something like that at least
07:58danvet: which would give you a nice excuse to handle the hotunplug stuff :-)
07:59danvet: in case you're bored ...
07:59lina: Yeah, unless I soon get confident that fw crashes just aren't really a thing any more, I plan to have a coredump thing at some point in debugfs so users can take full GPU memory snapshots (of all the firmware stuff at least, which is conveniently in VA-contiguous heaps)
08:00lina: I also have a debug mode that puts allocator tags in-band so that's all handy to make sense of badness after the fact.
08:00lina: (Right now I just do all that via the hypervisor but that's not useful for end users)
08:05danvet: lina, there's devcoredump or something like that for device state dumps
08:05danvet: not yet used by drm drivers since it's fairly new
08:06tomeu: doesn't freedreno use it in Mesa CI?
08:06danvet: yay I guess the name correctly
08:06danvet: tomeu, ah yeah we have a few by now
08:07HdkR: devcoredump in freedreno might even give you real crash information if you're lucky :)
08:07lina: Thanks! I'll look into that ^^
08:08danvet: yeah it's msm, etnaviv, amd, panfrost that seems to have support
08:08danvet: just i915 not using it
08:08danvet: (because that one predates devcoredump by quite a few years)
08:08danvet: pinged the xe folks to make sure they use that too
08:10lina: There's going to be a lot of refactoring to do on the driver and more abstractions to add... Just yesterday I fixed two firmware-involved deadlocks due to freeing interacting with faults. It's definitely going to need a lot of refactoring once we introduce a shrinker and really can't do allocs in the signaling paths and stuff like that...
08:10lina: I hope with your "merge early, refactor aggressively" idea this won't block upstreaming ^^ (otherwise I think we'd spend another half year+ getting all the necessary bits in, stuff like workqueues which I don't use right now...)
08:13lina: There's definitely places where I do something kind of dumb that could be better done with existing kernel features, it's just that I don't want to introduce even more abstraction dependencies than we already have right off the bat...
08:19danvet: lina, if you feel like a doc patch for drm/sched timed_out hook to point at devcoredump and maybe drm_dev_unplug in case of terminal fail might be good
08:19danvet: in general it'd be awesome if you can fix doc gaps because people ramping up notice them a lot easier
08:19danvet:always be volunteering people to improve docs :-)
08:20lina: That's a good point ^^ (though I also don't use that specific path since our firmware handles faults/timeouts for us)
08:21danvet: lina, oh you can set a per-job timeout and it just nukes it itself?
08:22lina: I think the timeouts are hardcoded or unknown fields of the initdata or something we haven't found yet. If a job faults or times out the whole firmware goes into a halt cycle, notifies the host along with a list of running jobs, then once told to resume resets the GPU and marks those jobs as complete and picks off after that.
08:23lina: I've only seen hangs that didn't result in a notification when something went horribly wrong, but I also am not sure if we can even recover from that because we can't just reset the firmware, so...
08:23danvet: yeah I guess for you that all would be terminal failures
08:24lina: Yeah, right now we just print a message but I could have it go into the "GPU crashed" codepath, it's just that so far I haven't seen any of those that wasn't traceable to something I screwed up so I'm not sure if it's worth plumbing in like that.
08:25lina: Those are rarer than outright crashes.
08:26lina: As for the actual faults, I use the fault cycle to mark all jobs the firmware claimed are pending as failed (but not complete), then after resume the natural (instant) notification of completion by the firmware signals the fences (which are now marked error, as well as the feedback BO getting the error data).
08:26lina: There's logic to distinguish between victim and culprit jobs etc and we do gather full fault data if available.
08:28danvet: hm pushing the culprit/victim tracking into drm/sched might be another one, maybe even some glue to provide the usual query ioclt you need for vk/arb_robustness
08:29lina: We currently have result buffers that tell userspace what happened to each job asynchronously, so it can find out after waiting on the fences (actually Alyssa pointed out that for async job cleanup we might as well just poll those buffers directly, it's way faster than an ioctl to do a nonblocking fence check so we'll probably do that)
08:30lina: I think drm_sched doesn't forward fence error markers right now between the hw fences and the job fences, which bothers me... but I'm not even sure if that normally ends up in userspace when waiting on sync objects either?
08:30danvet: lina, oh right I've seen that discussion, but didn't reply
08:31danvet: drm_fence error is kinda *meh* to "blows up in your face in interesting ways"
08:31danvet: i915-gem tried to go full blast in fence error propagation
08:31danvet: it resulted in app crashes taking down the compositor because it's a lot more tricky than it looks like
08:31danvet: due to that I'm very much of the opinion to just dont use it
08:31danvet: afaiui it's for android, and even there not really uapi
08:31lina: One thing I think I would like to do is try to make sure those errors propagate to the display controller at least, because especially once we introduce compression I'm not sure I want broken framebuffers ending up there...
08:32danvet: instead pass all gpu error state out-of-band to userspace in a query
08:32danvet: lina, pls no, been there, cried
08:32danvet: gpu crash recovery really is all up to userspace to re-render or better re-allocate everything
08:32lina: If we don't do that I hope DCP doesn't fault with arbitrary broken compressed framebuffers, because if it does that's another one we can't reboot... ^^;;
08:32danvet: unless your display dies when it scans out malformed compressed stuff?
08:33danvet: lina, yeah might be good to check
08:33danvet: but also, by design, this is impossible to fix
08:33lina: I know it dies when it tries to scan out an unmapped VA at least, and well... if compressed metadata is broken I could see that happening...
08:33danvet: userspace is allowed to lie about drm_fb metadata, and it can just cpu-write garbage into memory
08:33lina: And yeah, it's impossible to fix but at least we can try to make it not happen accidentally
08:33danvet: unless you're compressed stuff is some kind of special stolen memory region
08:34lina: (Like when any random process faults and takes down a compositor job with it)
08:34danvet: we've looked into this all for a bit different reasons with memory encryption revocation for content protection
08:34danvet: that one's even more annoying since it can happen to currently live scanout buffers
08:35lina: DCP has some content protection stuff but I have no plans to look into that ^^
08:35danvet: yeah it's just cros that wants it for netflix
08:43javierm: tzimmermann: sorry that I couldn't review your patch before and I see now that you posted a v4...
08:43tzimmermann: javierm, no problem.
08:44javierm: tzimmermann: my comments were pretty trivial anyways and you could change before pushing if you agree
08:45javierm: tzimmermann: but I mentioned I think you should keep danvet's original S-o-B tag (even when doesn't match the author in the patches)
08:45javierm: *as mentioned
08:45danvet: yeah if I spot the checkpatch I just add them both
08:45danvet: I'm a bit confused on this
08:45tzimmermann: javierm, let me reply to the mail. I'm going to start the next bike shedding :P
08:46tzimmermann: danvet, you sent the patches from @ffwll.ch but the sob tag says @intel.com
08:46javierm: tzimmermann: feel free to ignore me then :) I really want this series to land since as you said this nvidia+fbcon+vfio thing comes frequently in the fedora bug reports
08:47javierm: danvet: I think checkpatch is just silly, do you know if doesn't complain if you have first your author email S-o-B and then intel's ?
08:47danvet: yeah I think if it's both it's ok
08:47javierm: danvet: I think that even with both complains if the first S-o-B doesn't match the author
08:48javierm: that is intel and then ffwll.ch will still make checkpatch complain
08:49javierm: danvet: the reason why we need tzimmermann's patch is IMO just to make the code more readable and easier to understand. But from a functional POV is the same I believe
08:50tzimmermann: javierm, this ^
08:56danvet: yeah I put an ack on it
09:46mareko: Venemo: any other comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21861 ?
10:00Venemo: mareko: I am going to take another look soon, but at the moment I'm busy with some other stuff that I'd like to get into the branch point
10:21javierm: mairacanal: I'm looking at the thread https://lists.freedesktop.org/archives/dri-devel/2023-April/398958.html
10:22javierm: mairacanal: I'm not sure if Marius will be able to achieve what they want unless the vkms driver is migrated from the simple-KMS helpers
10:24javierm: mairacanal: because as you correctly pointed out, that patch is creating multiple simple pipelines, not just multiple connectors
10:24javierm: tzimmermann: what do you think ? ^
10:27tzimmermann: javierm, mairacanal, vkms uses simple-kms?
10:28tzimmermann: it only uses simple_encoder_init, which should be fine
10:29javierm: tzimmermann: ah, Ok. I just noticed that included <drm/drm_simple_kms_helper.h>
10:31javierm: but yeah, no drm_simple_display_pipe_init()
10:34javierm: tzimmermann: it seems simple_encoder_init() doesn't really add that much. It's just a wrapper around drm_encoder_init() with a drm_simple_encoder_funcs_cleanup that has .destroy set to drm_encoder_cleanup
10:34javierm: wonder if we should just inline that in vkms and get rid of the drm/drm_simple_kms_helper.h inclusion
10:36dolphin: airlied, danvet: see #intel-gfx, there is a request to do a backmerge to pull in some deps to drm-intel-gt-next, so do ping on the mail when you've merged the current PR.
10:37dolphin: I'll then do a backmerge to sync with drm-next
10:51danvet: dolphin, drm-next is at -rc4, that should be good enough for nirmoy?
10:57dolphin: nirmoy: jani: ^^ is -rc4 good enough?
10:58dolphin: nirmoy: I have to drop now, if it's not enough then let me know and I'll ask drm-next to be backmerged to more recent version and wait with the backmerge
10:59nirmoy: dolphin danvet: No, I don't see https://patchwork.freedesktop.org/patch/518617/ in drm-intel/drm-intel-gt-next yet
10:59dolphin: nirmoy: is the patch you need in drm-next already?
11:00dolphin: no point in doing a backmerge if that doesn't resolve the issue :) have to first backmerge mainline to drm-next and only then drm-next to drm-intel-gt-next
11:01nirmoy: dolphin: Yes, it is in drm-next
11:01danvet: dolphin, dont backmerge pls
11:01danvet: there's the broken deadline stuff in drm-next rn
11:01danvet: and the bugfixes are only in drm-misc-next
11:02danvet: I don't think it has an impact on i915 since that hand-rolls much of atomic, but I'm frankly not sure
11:02dolphin: ok, I'll wait then until until that is in
11:02danvet: maybe nag mlankhorst to get that pr going
11:02danvet: dolphin, I'll maybe also pull in your pr first?
11:02dolphin: yeah, I was hoping that
11:02danvet: ok
11:02dolphin: that's why the original ping, maybe I was too verbose :)
11:03dolphin: mlankhorst: ^^ you have thus been nagged about preparing drm-misc-next PR so that we can unblock this chain of backmerges
11:03dolphin: s/backmerges/{back,}merges/
11:06nirmoy: thanks danvet, dolphin for looking into this.
11:11tzimmermann: javierm, we should IMHO
11:12javierm: tzimmermann: yeah, posted two patches for vkms and Cc'ed you
11:13tzimmermann: simple_encoder_init was added because many drivers only had this pattern. i kind of regret adding it now, because it is a midlayer with little value.
11:13javierm: tzimmermann: exactly
11:43jani: dolphin: yeah -rc1 is enough
12:16melissawen: about the vkms multiple crtcs thing, I don't see why we need to attach the overlay planes initialization per CRTC. afaik, it's not the approach followed by vc4 and amd, where only primary and cursor planes are attached to the number of crtcs, I mean the "N planes per each crtc" thing.
12:17melissawen: can anyone elaborate it better?
12:19emersion: melissawen: hm, i guess it would be nice to have a way to configure this
12:20emersion: it's the same for primary/cursor planes fwiw
12:20emersion: some drivers advertise that a primary plane is compatible with all CRTCs
12:21emersion: maybe it makes vkms simpler to make overlays specific to a single CRTC? so that would be a good first step and can be generalized later?
12:21emersion: (i don't know, i haven't looked at the patches)
12:29melissawen: emersion, I liked your idea of restrict overlay planes to a single CRTC now. At least it sounds better to me than creating N overlay planes to each CRTCs.. I see i915 does it, but some other drivers keep overlay planes more independent.
12:31melissawen: I was in doubt if it would be a requirement, thanks for extending to the crtc compatibility thing too
12:32emersion: hm, i though restricting each overlay plane to a single CRTC was what the patchset was already doing
12:32emersion: i should probably stop talking before looking at the code :P
12:37danvet: jani, dim still managed to parse the drm-misc-next pull at least :-)
12:38danvet: melissawen, yeah there's largely two kinds of hw, those where planes are fixed to a single crtc, and those where they're freely reassignable
12:39danvet: since the later might result in some fun bugs maybe best left to a 2nd step (and with a vkms_config knob perhaps too?)
13:02melissawen: hmm.. mixed feelings
13:03melissawen: but I'll write some comments in the thread
13:05danvet: melissawen, I think for now I'd keep the num_planes as indicating a per-crtc value, that's probably simplest
13:05danvet: maybe even keep that when they're freely assignable, lots of displays have tons of planes
13:06alyssa: danvet: lina: I assume our plan for DCP (Digital Content Protection) is yarr mate?
13:06alyssa: It's a, type of South American tea.
13:06danvet: alyssa, I don't care either way
13:06danvet: I don't judge what people get high on :-P
13:12danvet: dolphin, merged everything to drm-next, up to you whether you want for intel-ci to first approve the result in drm-tip or backmerge right away
13:19jannau: is there a use case which requires hdcp but not some form of protected video decoding? I wasn't planning to reverse hdcp support in dcp (apple display), at least for now. there so so many more important things to do first
13:22alyssa: jannau: HDCP being High DCP, for when you drink too much yarr mate?
13:24alyssa:will stop shitposting
13:24alyssa: jannau: Ostensibly, no. In practice there are some broken HDMI monitors that want HDCP even when not displaying DRM video.
13:24alyssa: It's not something I've personally encountered, nor do I particularly want us to carry those patches.
13:25alyssa: But rumour has it that they exist.
13:25daniels: so they'll refuse to display YUV streams unless they're accompanied by HDCP, or?
13:26alyssa: daniels: I thought I heard this rumour from you, if I'm making stuff up I'd be glad to be wrong
13:26daniels: alyssa: it's certainly news to me
13:26alyssa: oh
13:26alyssa: then i'm definitely making stuff up
13:26alyssa:awkwardly shuffles away
13:27daniels: I mean there's a ton of breakage around HDMI but that one would be novel
13:29alyssa: daniels: I think it was the comments on https://www.collabora.com/news-and-blog/blog/2017/12/11/why-linux-hdcp-isnt-the-end-of-the-world/
13:29daniels: ah, maybe this was it: 'It's actually a net gain for freedom lovers. I've encountered A/V gear that only does 48 kHz, 16 bit stereo (or less) PCM if you don't do HDCP. Negotiate HDCP and you can do bitstream audio, or the full 192 kHz 8 channel 24 bit PCM.'
13:30alyssa: Yeah. Comment on your blog post from someone I don't know, something you said to me, potato tomato
13:30alyssa: :p
13:30daniels: by the transient property, I am the internet
13:30alyssa: Yes of course
13:33tzimmermann: melissawen, emersion, why would you want to move planes maong crtcs? seems over-complex to me. especially for a pure software drivers with no hardware contraints
13:33tzimmermann: 'among'
13:34emersion: tzimmermann: some hw supports it, so it's good if we can test it
13:34daniels: yeah, RPi is the usual poster child for this, but there's definitely some other hardware around that can as well
13:34emersion: sometimes you have 4 planes which can move between CRTCs, they are all on CRTC 1, but your fancy video player is on CRTC 2
13:35emersion: so you want to make use of the planes on that CRTC 2
13:35daniels: I wouldn't say it would be my first priority for vkms, but it does make sense to do at some point at least
13:35emersion: yeah, it's more of an advanced feature tbh
13:35tzimmermann: for testing, i see.
13:36tzimmermann: i'd start with multiple outputs and static planes TBH.
13:38tzimmermann: you can only bind a plane to a single crtc at the same time. and the exact limits depend on the HW. is there really much to test with vkms?
13:40tzimmermann: just checked: there's a possible_crtcs mask in struct drm_plane. that's certainly something to test
13:41tzimmermann: if there would be a max_planes limit for each crtc and the device as a whole, the drm core might be able to validate HW limits automatically
13:49alyssa: I'm feeling really deletionist
14:28karolherbst: alyssa: hey, wanna delete some code?
14:28danvet: lina, is IntoGEMObject the base class trait I got confused about?
14:28danvet: about the lookup_handle discussion
14:29alyssa: karolherbst: Yippee!
14:29karolherbst: how much work do you want to do before you are able to delete a bunch of code?
14:31danvet: dolphin, if you haven't backmerged yet I'm about to merge rodrigo's pull
14:31danvet: so perhaps hold off
14:32alyssa: karolherbst: Preferably none!
14:32alyssa: :p
14:34karolherbst: mhhhh
14:35alyssa: what do you want to delete?
14:35alyssa: is it gallium/drivers/tegra
14:35alyssa: you should def delete it
14:35karolherbst: was thinking about clover
14:36karolherbst: but that works as well
14:36karolherbst: fight with thierry over it
14:36alyssa: Ooooh deleting clover sounds like fun for the whole family
14:36karolherbst: yeah
14:36karolherbst: just needs some stuff to be fixed first
14:36karolherbst: like landing HMM support (ugh, but I really just have to test and land it) and getting radeonsi/r600 supported
14:37alyssa: :D
14:37karolherbst: but
14:37karolherbst: it's 16k loc
14:37karolherbst: I'll look into radeonsi this weekend and see if it can be enabled already or not
14:37karolherbst: I have no GPU to check with r600
14:44daniels: aiui gerddie is the only one who ever touches r600, and he's away for a few weeks
14:44tagr: dcbaker: I don't want to seem impatient, but since we both dropped the ball on this in the past, gentle reminder about this: https://github.com/mesonbuild/meson/pull/9993
14:44daniels: apparently babies are more important than ancient GPUs or something
14:45daniels: tagr: ahhhh thanks for fixing that!
14:45dcbaker: tagr: thanks for reminding me. I’m way behind on everything :/
14:45dcbaker: I think it might have to wait till the release freeze is over, but I can queue it for the next release
14:46robclark: tursulin, gfxstrand: have we come to a conclusion on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21973 ?
14:48tagr: dcbaker: at this point I don't care when it goes in, I just want it off my plate, to be honest =)
14:49tagr: daniels: huh... didn't realize anyone else was running into that, but glad I could help
15:32alyssa: I wonder how much faster NIR would be if it weren't so darn chunky
15:36dottedmag: alyssa: Sounds funny, given that Nir is a pretty common name in Israel.
15:37alyssa: dottedmag: ani lo midiberet ivrit
15:37tursulin: robclark: in the latest version setting of deadline is no longer unconditional but depends on non-zero timeout in the caller? Does it fix clvk like that?
15:37tursulin: *still
15:37tursulin: s/fix/improve/ I guess
15:38robclark: tursulin: yes, that is sufficient for clvk while not boosting someone who is just querying if a syncobj is signaled
15:39tursulin: hm timeout zero means just query somehow and not infinite wait?
15:40robclark: right
15:43tursulin: robclark: think I am okay with that, I was more concerned about the state of things in the version which had os_time_get_absolute_timeout(0) passed unconditionally to drm_syncobj_timeline_wait
15:45tursulin: I don't have a good view on how clFlush then translates to some VK API in clvk
15:45tursulin: clFinish actually
15:45tursulin: I guess it is not passing zero but "infinite future" in the implementation?
15:47alyssa: gfxstrand: cwabbott: What happened with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/623 ?
15:49gfxstrand: alyssa: burried in the sands of time
15:49alyssa: :|
15:49alyssa: Trying to rebase jump_if sounds painful
15:49gfxstrand: yeah....
15:50alyssa: But just deleting if_uses doesn't actually look too bad
15:50alyssa: and adding a `bool is_if` to nir_src to distinguish
15:50robclark: tursulin: I'm not sure _exactly_ what the call path from cl is (mattst88 might know).. but it is at least waiting so it must be using some non-zero timeout ;-)
15:50alyssa: actually I typed out a good chunk of that patch before bothering to search for prior art :p
15:50gfxstrand: alyssa: Yeah, deleting if_uses would be amazing.
15:51alyssa: gfxstrand: So no objection to doing that without the jump_if stuff?
15:51alyssa: deleting 16 bytes from nir_ssa_def at the cost of 1 byte in nir_src is still a win
15:51gfxstrand: Yeah
15:52alyssa: (and could steal a bit from is_ssa)
15:52gfxstrand: alyssa: It's not even 1B, put it next to is_ssa and it'll get lost in the padding
15:52gfxstrand: Evil thought: Use the bottom bit of the pointer. :)
15:52gfxstrand: Same for reg/ssa
15:54alyssa: I did think about that yes
15:54alyssa: One thing at a time
16:02jenatali: :O How did a dzn test job finish in 1:38... that's insanely fast
16:13gfxstrand: alyssa: We could do that for ssa/reg as well with ssa having 0 in the bottom bit, of course.
16:14gfxstrand: alyssa: Sounds dangerous... I kinda love it. :evil_grin:
16:15gfxstrand: But let's start with the bool. It doesn't bloat anything (gets lost in the padding) and should be relatively safe.
16:17alyssa: yep yep working on it
16:29alyssa: ok, typed something out. completely untested but should work right? :P
16:31alyssa: it compiles, must be perfect
16:33gfxstrand: Ship it!
16:33gfxstrand: :P
18:06jenatali: Ugh. "CI is taking too long" by literal seconds
18:06jenatali: That timeout needs to be longer
18:22daniels: jenatali: it doesn't; stuff needs to get fixed
18:23jenatali: That'd work too
18:23daniels: 60 minutes is already way too long; if we push it to 90 minutes, then people will come to rely as much on 90 as they do on 60, and we'll only be able to merge 18 MRs per day
18:23anholt: *attempt to merge :)
18:23daniels: ha!
18:23daniels: one of the biggest issues atm is that stoney+tgl+jsl have wildly unreliable UART, so using SSH for that is nearly done
18:24anholt: daniels: I assume our a618 hangs waiting for serial are also unreliable uart
18:24jenatali: Fair
18:24daniels: anholt: yeah, presumably
18:25daniels: anholt: but speaking of papering over breakage and a618, I was just staring at https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/850042 - if you look at the fdno section there, a ton of a618 jobs failed with deqp complaining of empty caselist sets and 'is your caselist out of sync with your deqp binary?' - which all succeeded on retry. has anything changed in deqp-runner lately which would cause that?
18:26anholt: dEQP error: FATAL ERROR: Failed to initialize dEQP: Empty test case name
18:26anholt: well that's new
18:26anholt: 2023-04-06 16:03:23.669193: zstd: /*stdout*\: No space left on device
18:26jenatali: Oof
18:27anholt: this is something where we should probably be grepping for that string and highlighting it in our ci daily reports. or something even more emphatic.
18:28alyssa: I thought the CI rootfs was ephermal?
18:28alyssa: ephemeral
18:28anholt: alyssa: it's on nfs.
18:28alyssa: uff
18:28alyssa: OK
18:28anholt: turns out you can't do much if you load the rootfs in ram. and loading rootfs onto real storage is s l o w
18:29anholt: (also, how many rmw cycles do you get on your emmc? wanna find out?)
18:29alyssa: Yeah..
18:29alyssa: anholt: It looks like shader-cache is enabled in CI
18:29alyssa: It should, not be
18:30anholt: it needs to be enabled in ci, both for coverage purposes and for perf purposes.
18:30alyssa: (at least, grepping .gitlab-ci I see no hits for shader-cache and it's default on i think)
18:30alyssa: coverage purposes I could see... perf how?
18:31anholt: turns out caching shaders is useful for being able to get through the ctses faster.
18:31anholt: they do like to repeat themselves a little bit.
18:31alyssa: hrm.
18:31alyssa: shader cache in a tmpfs then?
18:31daniels: anholt: oooooohhhh, ok
18:31daniels: let me go chase that up
18:32anholt: alyssa: yes, that is what we do,.
18:32alyssa: got it
18:32daniels: anholt: thanks for having better comprehension than I do :)
18:33daniels: koike: ^ 'no space on device' is probably something to add to our infra-error greps
18:33alyssa: (Mostly asking because failing to disable the shader cache caused my CTS runs to tank as soon as I filled up.)
18:33karolherbst: so uhm.. I think I'm ready to merge radeonsi support for rusticl 🙃
18:33alyssa: karolherbst: r-b
18:34karolherbst: but now I have to figure out meson stuff...
18:35karolherbst: dcbaker: so, I think I know why I'm seeing this `isystem` problem with meson. The compiler args from dependencies are fetched differently for bindgen and C/C++ targets
18:35karolherbst: so I end up with different flags
18:36dcbaker: are you getting isystem when you don't expect it, or not getting isystem when you do?
18:36karolherbst: but I've also seen that atm bindgen assumes it's generating bindings for a C file as well. So I think it might need some C vs C++ detection.
18:36karolherbst: isystem when I don't
18:36karolherbst: but the issue is really that the flags are different compared to when the same dep is used for a normal c/c++ target
18:37dcbaker: yeah, I think when I initially implemented the bindgen wrapper C++ was still in the "don't use this yet" stage so I punted it
18:37karolherbst: right
18:37karolherbst: well.. by default it autodetects it by the file suffix
18:37karolherbst: otherwise you can always explicitly state it as a compiler flag
18:38karolherbst: not sure if meson should care anyway
18:38dcbaker: I think we want to detect it by suffix, we already have the code to do that.
18:38karolherbst: sure, but you can always pass it as a c arg into the target
18:38karolherbst: but then it's weird anyway
18:39karolherbst: but that's not a problem atm. The problem which need to be solved first is that isystem one, because that breaks compiling with a custom llvm once I pass the dep in
18:40karolherbst: also.. landing this would help as well: https://github.com/mesonbuild/meson/pull/11441 :D
18:40dcbaker: I suspect there are things we're already not handling correctly that we need that information for, namely I think there's now an add_project_dependencies() which takes a language argument and we should probably at the very least be pulling out the compiler flags for those
18:41karolherbst: ahh
18:41dcbaker: so knowing whether we need C or C++ would matter there
18:42dcbaker: I dont think that I can land the b_ndebug thing until after 1.1.0 ships (rc2 just came out)
18:42karolherbst: yeah, that's fine
18:42karolherbst: that one isn't really critical
18:42dcbaker: I've been kinda off the ball on stuff, just a lot of RL stress right now
18:42karolherbst: fair
18:43dcbaker: but I am personally annoyed that it didn't get in since it's on the milestone, has multiple non-maintainer reviews, and is not complicated
18:44dcbaker: The isystem stuff on the other hand is probably a bug fix, so we might be able to get that done sooner
18:48karolherbst: yeah.. I can try to figure it out, but I think the solution here would be to generate the compiler flags identlical to how it's done in C/C++ targets
18:48karolherbst: kinda weird that it's handled differently tbh
18:49karolherbst: anyway.. I need it for shader caching stuff, because... uhhh.. any other way would be even more terrible
19:44DemiMarie: lina: my recommendation is to include enough protection in the kernel to make sure that the display controller cannot be crashed
20:54jenatali: Wee -600 lines from my CI xfails :)
21:09alyssa: Woo
21:44alyssa: help i'm trapped in a coccinelle factory
23:24alyssa: gfxstrand: OK, next deflate-NIR idea
23:25alyssa: AFAICT, there is no NIR pass that uses both pass_flags and instr indexing
23:25alyssa: Why not merge into a single field?
23:25alyssa: that is, the few passes that need instructions indexed use their pass flags for the index (with pass flags expanded to 32-bit to compensate)
23:26alyssa: ..wait, that doesn't actually save anything because of packing, still have instr type dammit
23:27anholt: alyssa: since you're in the area, you know about pahole, right?
23:27alyssa: yes that's what I was looking at that made me think about it :)
23:28alyssa: and has now alerted me to a much bigger waste, nir_reg_src being embedded inside nir_src directly
23:28alyssa: it's in a union with nir_ssa_def*, but one's a pointer (8 bytes) and the other is the whole structure (24 bytes)
23:28alyssa: => we're wasting 16 bytes per nir_src on SSA compilers
23:29alyssa: which is even more significant than the thing I found today
23:29jenatali: :(
23:29alyssa: The easy fix is to change to a nir_reg_src*, which fits snugly in the union at the cost of a bit more indirection when nir_reg_src is actually used
23:30alyssa: The hard fix is to get rid of nir_reg_src but that's unfortunately a "later" problem. We could get most of the benefit if we got rid of register offset+indirect, but I think that might be doing something for a few backends (intel vec4, and maybe ir3)
23:32alyssa: and maybe TTN and r600/sfn .. mumble
23:32anholt: nir_lower_locals_to_regs says intel, r600, ir3, ntt, etc.
23:32alyssa: yeah. definitely not a project for any time soon
23:34alyssa: but that easy fix should be doable and get most of the win
23:36gfxstrand: alyssa: Does anyone actually use the indirect crap?
23:36gfxstrand: maybe ir3?
23:37alyssa: gfxstrand: see 2 lines up
23:37alyssa: the better question I suppose is whether it's load bearing
23:37gfxstrand: alyssa: Maybe this is a good first thing to byte off to try and get rid of nir_reg_src.
23:37gfxstrand: Relegate array registers to a special instruction.
23:37alyssa: Yeah, that could be a decent approach
23:37alyssa: Definitely not an April project. Maybe a May one
23:37gfxstrand: That would probably only require touching 2-3 back-ends.
23:38gfxstrand: And I'm far less worried about getting copy-prop exactly right for the array case.
23:38alyssa: at least 4 backends see above
23:38gfxstrand: Right
23:38gfxstrand:wishes the Intel vec4 back-end would die already
23:38gfxstrand: We could probably make that one use load/store_scratch
23:41alyssa: 23:37 alyssa: the better question I suppose is whether it's load bearing
23:41alyssa: I.e. are there real workloads that run on that class of hardware that have their performance materially improved from the indirects
23:41gfxstrand: The Intel vec4 back-end is affected. We can't just turn on indirect lowering.
23:42alyssa: is this a thing games use?
23:42gfxstrand: But like I said, we can go the load/store_scratch path, we just need to wire it up.
23:42gfxstrand: Ues
23:42gfxstrand: sadly
23:42alyssa: Boo
23:42alyssa: doesn't load_scratch hit the stack..?
23:42gfxstrand: You'd have to dig back through the history. This is an anholt thing from like 10 years ago.
23:42gfxstrand: Yes, what the vec4 back-end does today is use load/store_scratch for arrays.
23:42gfxstrand: It just does it in the back-end which is dumb.
23:45alyssa: Oh. That's just wrong :p
23:45alyssa: yes, we can fix that then
23:45alyssa: i thought it had actual indirect access to the GPRF
23:46gfxstrand: It does but we don't use it because that would be insane.
23:47alyssa: Right.. if it's just a question of wiring up 2 intrinsics and calling the lowering pass, all the more reason to wean it off
23:47alyssa: i've just discovered brw_clip i am closing the file
23:51alyssa: this is too terrible what
23:53alyssa: ok. sure. we can fix this for intel vec4 fine
23:53gfxstrand: Yeah, stay clear of brw_clip.c
23:53gfxstrand: There be dragons and they're hungry!
23:55alyssa: Seemingly ir3 really does do GPRF indirect access, but also ir3 is a good backend so I'm not worried about plumbing in load_array/store_array intrinsics for it to get the same effect