IRC Logs of #dri-devel on irc.freenode.net for 2023-01-19

00:20 alyssa: Can I trade some NIR review with someone?
00:20 alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016
00:21 alyssa: "nir/lower_blend: Clamp blend factors" and "nir/lower_blend: No-op nir_color_mask if no mask"
00:21 alyssa: Gert reviewed the rest already :-)
00:27 alyssa: cwabbott: Can I get an ack/nak on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562/diffs?commit_id=718fa2bd1fdcd81034761f518503fcf7c675a540 ?
00:27 anholt: alyssa: feels like your blend color you're passing as a uniform should be clamped outside the shader?
00:27 alyssa: anholt: that doesn't solve the problem for SNORM
00:28 alyssa: SNORM format, constant colour of -0.5 (which is in bounds for SNORM)
00:28 alyssa: and then the blend factor is ONE_MINUS_CONSTANT = 1 - (-0.5) = 1.5 which is out of bounds and needs to be clamped to 1.
00:29 anholt: whee. ok.
00:29 alyssa: For UNORM it would suffice to clamp in the driver
00:30 alyssa: right now lower_blend has two actual users, the mali and agx gl drivers
00:30 alyssa: mali is specializing blend shader to the blend constant (big yikes but arm does the same thing and it hasn't blown up in our face yet..>)
00:30 alyssa: so will constant fold out
00:31 alyssa: and agx uses nir_opt_preamble so the fsat will happen in the preamble shader and be basically free, except for eating some extra uniform slots
00:31 alyssa: I guess it's worse for panvk since preambles aren't free on mali
00:32 alyssa: (but also, more driver overhead/complexity to do the clamp there since it's format dependant, and also I doubt anything cares nothing is fsat bound.)
00:33 alyssa: dr. ekstrand, or how i learned to stop worrying and love my shitty SIMD VLIW hardware
00:52 alyssa: let me know if there's review I can tat with
03:06 airlied: okay finally got anv h264 decode fixed
03:17 kisak: vulkan video?
03:19 psykose: straight from the volcano
03:19 kisak: onto the anvil, forged into a bitstream
03:21 psykose: Industry!
03:23 Lynne: it may sound like a small feat, but it isn't really, since it makes intel's bad vaapi drivers redundant
03:26 psykose: and you should be proud of yourself
03:44 airlied: Lynne: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20782 anv h264
03:56 jenatali: Man, CI is a mess right now
03:57 zmike: hit retry again you coward
03:57 HdkR: Hammering the CI harder until it works
03:57 zmike: title of my biography
03:59 jenatali: Is there an easy way to see how many jobs I've retried in this pipeline? Pretty sure it's close to 10
04:00 airlied: let's just get marge to read the logs spot 500/503s and retry the job straight away :-
04:00 airlied: :-P
04:01 airlied: but yes it's definitely in a low point
04:01 jenatali: It's not even that. One of them was a zink traces run where one trace was off by 4 pixels
04:01 psykose: if you click a pipeline the "jobs" tab has a new job per try
04:01 psykose: so.. increasing number = more retries :p
04:01 jenatali: And a stoney job that timed out after pretty much succeeding
04:02 jenatali: I'm just trying to get my Phoronix article written :P
04:06 airlied: Lynne: h265 on anv might be less fun, looks like it only has slice level hw (i.e. long format)
04:08 jenatali: Finally
04:08 jenatali: 9 re-runs
04:29 airlied: Lynne: okay you can get h265 non slice if you have huc fw enabled
04:29 airlied: this is going to be one of those rabbit holes I'd rather not fall down
05:39 Lynne: huc fw?
05:40 airlied: some video decode/encoder fw blob
05:40 airlied: mostly uesd for lower power usage
05:41 Lynne: so, not commonly installed?
05:41 airlied: I think tigerlake+ it might be used, trying to work out where it happens on dg2 etc
05:42 Lynne: the mess spans even to binary blobs
05:42 airlied: might be easier to propose slice mode for h265 :-P
05:43 airlied: I think av1 dec might be possible without huc, but it's a long thread to pull
05:45 Lynne: the vulkan way for lice decoding would be to have a semaphore signalled on each, with a final decode operation which waits on all, does deblocking and film grain, and signals the output frame semaphore
05:45 Lynne: *lice
05:46 Lynne: SLICE
05:49 airlied: oh so you'd put one slice per command buffer
05:49 airlied: not sure how the hw/fw would deal with that though
05:50 airlied: I think the hw would want all the slices in one command buffer
05:52 airlied: at least the intel slice mode, you just add more slice bits to the command buffer and the blastSlice bit gets set on the last one
05:53 Lynne: really? I thought it did scheduling on a per-slice basis
06:01 airlied: from the programming info I don't see it
06:02 Lynne: welp, it may be doing it on a firmware level then
06:02 Lynne: I don't mind the single command buffer approach either
06:02 Lynne: just as long as I don't have to copy the slice data...
06:04 Lynne: wish I could prefetch host-mapped data onto the GPU
06:04 airlied: I think having slice offsets would be fine into the bitstream
06:04 airlied: better having vaapi decode the slice headers than the vulkan driver doing it
06:09 Lynne: you mean the short mode?
06:10 airlied: Lynne: short mode would work with current interface, but the only hw support for shortmode needs huc fw
06:11 airlied: the long mode hw works without huc fw, but we can't program that without the vulkan driver dceoding slice headers
06:11 airlied: and I went down that path for h264 and am quite certain it should be done anywhere else :-P
06:15 Lynne: is the huc fc incompatible with other firmware?
06:16 robclark: Lynne: when it comes to video "lice" decoding might not be such a typo :-P
06:23 Lynne: https://nitter.snopyta.org/daemon404/status/599969174723567616#m
06:25 airlied: Lynne: no it should work fine once it's loaded I think
06:26 airlied: Lynne: also I think intel av1 decode might require a batch buffer per tile
06:26 airlied:is slowly finding the useful comments in the sea of crpa
06:26 airlied: crap
06:27 Lynne: so the huc fw is some additional firmware that would need to get loaded alongside the regular fw?
06:28 Lynne: that wouldn't be so bad, it would take a while, but downstream would ship it
06:29 airlied: I think it's already shipped for tigerlake+
06:29 airlied: and probably for older just not eanbled
06:30 airlied: and on DG2 there is some other way of loading it, which I'm not sure I understand, need to play with my dg2 a bit more to even see if it claims hevc short
06:31 airlied: "Currently, each batch buffer can contain only 1 tile to be processed, it cannot contain more than 1
06:31 airlied: tile or the entire tile group of tiles"
06:31 airlied: there we are, this will make the AV1 vulkan interface need some more thought
06:32 airlied: you have to resend all the frame state before each tile
06:32 Lynne: batch buffer == command buffer?
06:32 airlied: yes
06:33 Lynne: oof
06:34 Lynne: so you'd need my theoretical slice decoding model
06:34 Lynne: or firmware intervention
06:35 airlied: I'm not sure if you'd use secondary command buffers here or not
06:37 airlied: hmm reconciling intel and amd on this should keep the TSG busy :-P
06:38 airlied: though it might be possible to just do it wiuth a flag
06:38 airlied: saying MULTIPLE_TILES_IN_CMD_BUFFER_ALLOWED
06:38 airlied: then if you don't have that, record a separate command buffer per tile with StdVideoDecodeAV1MESATileList.nb_tiles == 1
06:39 airlied: I think that model would work for both, since you need to resend the frame info for each tile on intel
06:39 Lynne: I'm pretty sure this ought to be fixable in firmware
06:46 airlied: I don't think AV1 has explicit fw in this case
06:46 airlied: the programming mode looks pretty fixed in hw registers
06:47 Lynne: still, it's an odd limitation
06:48 airlied: I'll dig into how media-driver does it tomorrow, I'm just reading the public register docs
06:49 Lynne: how does dxva2 work with av1 on intel then? in our code, we concat all tiles
07:31 airlied: Lynne: they split it out in the driver?
07:31 airlied: d3d12 I'd be more interested in
07:37 Lynne: does it even have av1 support?
07:38 airlied: not sure
07:38 airlied: don't see any spec
07:39 airlied: ah yes there is some stuff in mesa for it
07:40 airlied: yeah seems to concatentate as well
07:45 airlied: get someone should write a driver and see :-P
08:52 mripard: tzimmermann: oooh, nice diffstat
08:53 tzimmermann: mripard, it has been a busy week :D
08:54 tzimmermann: well, we removed the ancient UMS drivers.
08:55 mripard: yeah, I can see that, great job
08:55 javierm: tzimmermann: speaking about killing legacy stuff, have you seen: https://www.reddit.com/r/linux/comments/10eccv9/config_vtn_in_2023/ ?
08:56 nirmoy: Hi drm maintainers, Please help with reviewing https://patchwork.kernel.org/project/dri-devel/patch/20230117115350.1071-1-nirmoy.das@intel.com/ I have r-b from Intel folks and Sam but I think I need ack from drm maintainers for this.
08:56 tzimmermann: javierm, no. before i read it, what's the summary?
08:57 javierm: tzimmermann: tl; dr plymouth should be able to handle CONFIG_VT=n now since was ported to use libinput, we still need a panic handler a la drmlog
08:57 javierm: the latter is the missing piece
08:58 tzimmermann: i see.
08:58 danvet: javierm, also some unclarity about what the standard kmscon should look like
08:58 danvet: my google fu failed me on this cage/foot thing
08:58 danvet: congrats to whomever named that project
08:58 tzimmermann: it's pretty cool that someone actually tried it
08:58 danvet: yes
09:00 javierm: yeah, I tried after plumbers but had issues with weston, can't remember now the details but could check my notes
09:01 javierm: danvet: cage is a display server and foot a terminal emulator
09:01 tzimmermann: javierm, danvet, for generic drmlog, we'd first need better support for vmap/vunmap in several drivers. some drivers don't implement those interfaces well
09:01 javierm: danvet: https://github.com/Hjdskes/cage and https://codeberg.org/dnkl/foot
09:01 tzimmermann: just saying
09:02 javierm: tzimmermann: yeah, but as long as it works on simpledrm, it may be enough?
09:02 javierm: because I would expect this to be an issue for early boot in general
09:02 tzimmermann: javierm, i think we want toshow the log with any driver?
09:02 tzimmermann: no?
09:03 javierm: tzimmermann: yes, but my point was that even with VT, if the is a panic and you are in a wayland session, you may not be able to switch to VT anymore
09:03 javierm: *there is a panic
09:03 danvet: javierm, I still can't google cage :-/
09:03 javierm: danvet: I shared the repos above ^
09:03 javierm: but yeah, cage and foot are really not good names for search ability
09:04 tzimmermann: javierm, that seems fix-worthy as well
09:04 danvet: yeah just saying
09:04 danvet: ah I need to throw in "wayland" for the search to work
09:04 javierm: tzimmermann: absolutely. What I'm trying to say is that it may not be a requirement to disable VT in distros
09:04 tzimmermann: fair point
09:05 javierm: danvet: I expect distros would choose their own combination of wayland compositor, terminal and greeter
09:05 javierm: for example, I guess the Fedora workstation folks would like to use mutter, gdm and gnome-terminal
09:05 javierm: just to avoid maintaining two sets in the distro, etc
09:07 danvet: javierm, I figured the idea is to have something really minimal so better chances it works as emergency
09:07 danvet: like if your mutter is toast because of some library bug
09:07 javierm: tzimmermann, danvet: btw, jfalempe said that he would like to look at drmlog + a panic handler to use it once he has some time
09:08 tzimmermann: sure
09:08 javierm: danvet: yeah, with my kernel/DRM developer hat, I would certainly prefer that
09:08 javierm: but I would also understand the other point since for sure the maintanance of that minimal wayland compositor + terminal + greeter will be done by the desktop folks
09:08 danvet: javierm, wasn't there already a thread on dri-devel? I have vague memories at least
09:09 javierm: danvet: about drmlog you mean or this discussion of the minimal "VT" using KMS ?
09:09 javierm: the latter was discussed in plumbers but I don't remember a thread
09:09 danvet: tzimmermann, uh backmerge overdue?
09:09 danvet: I also need one because trivial conflict in merging the ivpu driver in MAINTAINERS
09:09 danvet: because we don't yet have the accel subsystem entry
09:10 danvet: javierm, nah the minimal kernel console
09:10 danvet: output only, i.e. a true kernel console not a vt console which also does input
09:10 javierm: danvet: yes, Noralf proposed a drmcon at some point
09:11 danvet: tzimmermann, can you pls ping me when the backmerge is done so I can apply the ivpu driver?
09:12 javierm: anyway, I just wanted to share that reddit thread since saw it on mastodon a few days ago and thought it was interesting :)
09:15 MrCooper: javierm: gnome-terminal seems on the way out, gnome-console is the new hotness :)
09:17 javierm: MrCooper: I actually meant gnome-console, is just that I conflated the names :)
09:17 javierm: MrCooper: but you get the idea, pretty sure you folks won't be interested in maintaining a weston, foot, etc packages in Fedora
09:17 javierm: only to be run in the initrd to have a minimal user-space console
09:19 tzimmermann: danvet, sorry, didn't follow the discussion. You need a backmerge into drm-misc-next?
09:26 danvet: tzimmermann, yeah so I can get the accel stuff needed for ivpu merging
09:26 tzimmermann: ok, working on it
09:27 danvet: thsd
09:27 danvet: *thx
09:28 eric_engestrom: am I missing something, or that VT=n thing means that if there's a bug in mesa (preposterous!) your fallback is toast too?
09:28 eric_engestrom: I wouldn't feel confident doing that for myself, so I'm rather surprised this is being discussed as being done by distros to end users
09:28 MrCooper: there's no requirement for the fallback display server to use Mesa
09:29 eric_engestrom: sure, but if it's mutter then it has to, right?
09:31 eric_engestrom: maybe I'm not awake enough yet, but also: any wayland compositor will need mesa (or another impl) for clients (foot in this discussion) to work
09:32 emersion: eric_engestrom: a display server can use pixman instead of GL/Vulkan
09:32 javierm: eric_engestrom: what emersion said, I'm not that familiar with mutter to know if can use a pixman backend or not
09:32 emersion: i don't think so
09:32 emersion: wlroots, weston have a pixman renderer
09:32 javierm: but there are efforts to have mutter based minimal compositors such as https://gitlab.gnome.org/GNOME/gnome-kiosk
09:32 eric_engestrom: emersion: ah, right
09:33 eric_engestrom: thx :)
09:33 emersion: foot uses software rendering and no complicated library, which makes it quite nice for this use-case
09:33 javierm: in any case, a gnome-vt or whatever may be preferable than shipping weston or wlroots, at least both gnome-shell and gnome-vt would be based in libmutter
09:34 emersion: :sadface:
09:34 javierm: but this is just my guess, if all distros could share the same minimal console that would be great
09:34 javierm: a kde based distro may want to use kwin in all places, etc
09:35 emersion: i hope non-Fedora distros have a different take :)
09:36 javierm: emersion: btw, I'm not talking for Fedora, not even for the Fedora workstation folks :)
09:36 javierm: this is just my guess, because in general distros tend to not agree on many stuff
09:37 javierm: emersion: otherwise all distros would had just settled on rpm, which of course is much better :P
09:37 emersion: lol
09:38 javierm: jokes apart, maybe if there's an upstream "user-space vt" project then all distros could adopt
09:38 javierm: if their maintanance work would not be that big
09:55 danvet: yeah if the emergency compositor/vt stack is exactly the same as you're regular desktop it's not going to be of much use
09:56 danvet: kiosk mode or not is kinda orthogonal
10:00 tzimmermann: danvet, backmerge done
10:01 danvet: tzimmermann, thx
10:01 javierm: danvet: it could be a difference cadence though, and keep the original version used during installation in the "rescue" entry initrd, just like we do for the kernel
10:01 danvet: hm yeah that might help
10:01 javierm: danvet: just playing devil's advocate to the idea that all distros could agree on a single graphical stack for early boot
10:01 danvet: javierm, yeah that's very unlikely
10:02 danvet: unless kmscon gets resurrected and gets some official blessing
10:02 danvet: probably in systemd-emergenycon or so
10:03 javierm: danvet: yeah, but AFAIU from what Lennart said, is that they dropped that idea because noticed that would need i18n, a display manager, etc and basically duplicating what was done by wayland compositors + terminals
10:04 javierm: but having systemd to depend on a particular wayland compositor and libvte or whatever could be a trojan horse to make distros agree on a graphical stack for early boot :)
10:04 javierm: at least systemd-based distros, which are the majority
10:05 danvet: javierm, yeah maybe part of plymouth instead, since that has a chunk of the deps
10:05 danvet: and needs glyph printing for logging now anyway
10:07 javierm: danvet: right, that may work indeed
10:09 danvet: javierm, since if you're non-graphical distro then systemd emergency console on serial or virt serial still works
10:09 danvet: but with graphical you probably have plymouth around or similar
10:10 danvet: so avoids the entire "systemd has now a gl renderer, it's complete bloat" discussion
10:11 javierm: danvet: yeah
12:34 tursulin: danvet, airlied, mripard, tzimmermann: wrt 693e4b42-3883-8a6a-5181-0357e4b88767@linux.intel.com - one new api export from drm plus one i915 fix which uses it - acks/suggestions on which route to take to merge it?
12:35 tursulin: or even s/it/both/
12:49 mripard: tursulin: is it really a fix? it doesn't look like one to me
12:50 tursulin: mripard: 2nd patch is i915 fix which needs the 1st
12:53 mripard: wouldn't it be easier to backport if it was the other way around?
12:56 tursulin: mripard: other way around?
12:56 tursulin: local implementation and move it out to common later?
12:59 mripard: yeah
12:59 mripard: but looking more into your first patch, it's indeed not going to be easy
12:59 mripard: if you want to take it through drm-misc-fixes, fine by me
13:01 tursulin: hm what do you see as not easy in that option?
13:01 tursulin: to clarify, sending both via drm-misc-fixes is fine by you?
13:02 mripard: yes
13:03 tursulin: Cool, could you scoop them up from the ml and merge? They passed our CI and user has reported them working fine so I am reasonably confident its all good.
13:33 mripard: tursulin: applied and pushed both
13:42 tursulin: mripard: thanks!
13:43 tursulin: hm stable tag did not get captured or you dropped it?
13:43 tursulin: indeed patchwork seems to have ignored it.. okay, will ask for it to be follow up
13:43 tursulin: followed up
14:25 bbrezillon: danvet: not sure you're the right person to ask, but I've been trying to implement a VM_BIND-like interface for pancsf, and I have a few questions
14:30 bbrezillon: first thing to note is that, unlike Xe, the page table setup is all happening on the CPU, and I'm not directly in control of the pte allocations, since it's hidden behind the io_pgtable framework (io-pgtable-arm.c implem in my case). That means I can't easily reserve memory to guarantee the map/unmap operation will succeed.
14:37 bbrezillon: (I'm trying to avoid the extra XE_VM_BIND_FLAG_ASYNC complexity)
14:42 bbrezillon: I mean, I can always flag the VM as inconsistent and fail all future jobs bound to this VM when such errors happen. On the Vulkan side, that would probably lead to a DEVICE_LOST error, which would force a VM recreation, but I'm not sure that's acceptable.
14:43 bbrezillon: jekstrand: in case you have any tips on that ^
14:46 dolphin: bbrezillon: are you having a separate set of PTEs for the GPU, still?
14:47 bbrezillon: the second question is more an implementation detail. I see the Xe driver has its own implementation to deal with VMAs. I've been using drm_mm, but I guess there's a good reason for not using it (too heavy, probably). Mind detailing those reasons?
14:47 dolphin: if the application can fully control the PTEs, and they shoot themselves in the foot, shouldn't be a problem
14:47 bbrezillon: dolphin: yes
14:50 bbrezillon: by separate set of PTEs, you mean it's not shared with the CPU, right? We have a page table per GPU VM, and any map/unmap operation can only touch VM instantiated through a specific DRM fd
14:51 dolphin: yeah
14:52 bbrezillon: the thing is, there are situation where the user didn't do anything wrong, it's just that the kernel implem updating the page table can't allocate memory to map/unmap those pages (unmap can trigger 2MB PTEs into 4K ones, so mem allocation failures can happen there too)
14:53 bbrezillon: *2MB PTEs split into 4K ones
14:54 dolphin: right, if your PTEs are in system memory, then there's no way to predict future errors
14:59 danvet: bbrezillon, yeah that means you need to do pte setup as part of the execbuf ioctl, before the point of no return in drm_sched_job_submit
14:59 danvet: or _arm()
14:59 danvet: (I didn't check that detail again)
14:59 danvet: but also since you have integrated, that shouldn't be a big deal
14:59 danvet: discrete has to do pipelined pte because otherwise you can't pipeline moves, and with vram that's big time suck
14:59 danvet: bbrezillon, also jekstrand knows these things generally too
15:00 danvet: oh also in the vm_bind ioctl itself the point of no return is probably not the sched_submit for you (since you don't do anything pipelined on the gpu I guess)
15:01 danvet: but just the ioctl
15:01 bbrezillon: 'pte setup in the execbuf ioctl' => hm, how would that would with async vm_bind ops (where ops depend on other fences being signaled, potentially moving things around in the page table)
15:01 danvet: so you probably also have no need for an out fence
15:01 danvet: bbrezillon, yeah you don't have much async vm_bind
15:01 danvet: since no gpu pipelining
15:01 danvet: also you still need the execbuf path in case the shrinker or something nuked your working set
15:01 bbrezillon: by async, I mean it still needs to wait for the deps
15:02 danvet: in full generality at least, I'm not sure how full featured your panfrost is
15:02 danvet: bbrezillon, you shouldn't need to wait for any deps
15:02 bbrezillon: (didn't hook up the shrinker yet)
15:02 danvet: or at least I thought we ditched that part outright since mesa vk really doesn't need it
15:03 danvet: (on intel side lvl0 has opinions, but personally I'm leaning a bit towards that they're wrong, since all we do is trade a kernel thread for a userspace thread)
15:03 danvet: bbrezillon, drm_mm feels a bit a mismatch
15:03 bbrezillon: how do you implement vkQueueSparseBinding()? Waiting for deps in a userspace thread?
15:03 danvet: I do think a drm_vma or drm_vm would be nice, since the real tricky thing here is the locking
15:04 danvet: bbrezillon, set the magic mesa vk bit to get a userspace queue
15:04 danvet: for that you need to ask jekstrand
15:04 bbrezillon: ok
15:04 danvet: once you have that you really only need the out_syncobj on dgpu so that you can benefit from a bit deeper pipelining on the gpu side of thigns
15:04 danvet: that was at least the idea
15:04 bbrezillon: so no VM_BIND ioctl needed, just synchronous VM_MAP/UNMAP ones
15:05 danvet: well that's vm_bind/unbind really in your case
15:05 danvet: or was meant to be
15:05 danvet: I'm maybe a bit out of the loop
15:05 danvet: also I think jekstrand made some noises about yet another silly corner case and maybe we again need the kernel thread
15:05 danvet: in that case I think a drm_vma.c which extract these common pieces into a helper would be really good
15:06 danvet: since too many ways to get it wrong, and at least with the mesa queue helpers there really should be just one standard way to do this across drivers
15:06 danvet: and no need for per-driver implementation with slightly different semantics
15:06 danvet: since then you'd just need to absorb these mismatches again on the mesa side, which makes no sense at all
15:07 danvet: (aside from more fundamental stuff like dgpu and their gpu move pipelining, which isn't really a thing on integrated in many cases)
15:08 bbrezillon: well, when I said MAP/UNMAP that was just a way to differentiate from the xe/i915 VM_BIND ioctl, which is async by nature
15:09 jekstrand: DavidHeidelberg[m]: No, I've not looked at it at all
15:09 bbrezillon: re drm_vma.c => that'd be great to have
15:13 bbrezillon: FYI, I've been using drm_mm because I have a flag that allows auto-VA assignment (in the 0x800000000000-0xffffffffffff VA range), but that's probably not needed if we let userspace control the VA space (I guess mesa's util_vma is here to automate that a bit)
15:15 danvet: bbrezillon, yeah imo for new uapi just let userspace deal with the entire layout
15:15 danvet: it's really not hard to have a va allocator
15:16 danvet: wrt vm_bind being async, I'm still hoping that it's as non-async as we can get away with
15:16 bbrezillon: danvet: let me check if I got this right. drm_{vma,vm}.{c,h}, and generic VM_{CREATE,DESTROY,BIND,UNBIND} ioctls
15:16 danvet: because memory fences queues in the kernel are just a funny idea
15:16 danvet: as in should probably cgroup account that stuff somehow at least :-)
15:16 danvet: bbrezillon, I think jekstrand mentioned to also go full generic ioctl
15:17 danvet: I'm not sure whether that's a step too far, we might be in extensions overkill land with that
15:17 danvet: but maybe default implementations and stuff like that
15:17 danvet: so driver parses it's own ioctl struct and does any handling of special cases
15:17 danvet: and then you just stuff that into the helper and done
15:17 danvet: with a few driver callbacks or something like that
15:18 bbrezillon: sounds good
15:18 bbrezillon: yeah, you'd need callbacks for the page table ops
15:18 danvet: we've gone 100% driver specific in rendering, and I think we should make sure to not overcorrect here
15:18 danvet: because dri1 tried to do that, and it was a dumpster fire
15:19 danvet: and I think consensus from the cross platform teams is still that fully generic memory management isn't great (like on windows), endless amounts of driver bypass/extensions needed
15:19 danvet: I think agd5f_ has a solid take on this too (or at least I remember from some discussions way way back)
15:20 danvet: bbrezillon, but if we can get the big data structures and the locking/async sorted, then I think that's a huge win in at least semantic commonality
15:20 danvet: kinda like the sched stuff really isn't much, but it's /just/ enough to make reading random drivers easier
15:21 danvet: especially on the tricky bits like cross engine/sched reset impact
15:21 bbrezillon: is that really sorted out? :P
15:25 bbrezillon: before I go and try to extract drm_{vm,vma} bits out of the xe driver, is this something someone in the intel team is planning to work on/working on already?
15:25 bbrezillon: danvet: ^
15:26 danvet: I'm no longer on the xe team, so maybe ask on #xe ...
15:27 danvet: it was on a vague notion of an idea way back
15:30 siddh: Hi, can anyone review patches 1 to 7 of https://lore.kernel.org/lkml/cover.1673269059.git.code@siddh.me/ ?... The last 3 patches have been merged. Thanks.
15:35 jekstrand: bbrezillon: RE: userspace threads. Vulkan working group discussions started yet again about this and I think we're going to nuke vkCmdWaitEvents on host signals as an actual command buffer blocking operation. That would mean userspace threads are fine. There are tests that are still problematic, though. I don't remember where that discussion is at off-hand.
15:36 jekstrand: Sadly, I keep going back-and-forth on that. :(
15:37 jekstrand: danvet, bbrezillon: Big fan of doing something in common code. I didn't use drm_mm in Xe mostly because I didn't know about it and/or didn't think of using it for that. I was under the impression that it was mostly for allocating virtual addresses (which we don't need) rather than managing them.
15:37 dj-death: are you allowed to have 2 descriptions of the same binding in glsl but slightly different :
15:37 dj-death: layout(set = 0, binding = 0, rgba8) readonly uniform highp image1D u_image[2];
15:37 dj-death: layout(set = 0, binding = 0) readonly uniform highp image1D u_image2[2];
15:37 dj-death: in the same shader
15:38 dj-death: like 2 variables somewhat similar but not quite
15:38 bbrezillon: jekstrand: well, it can do both, but it comes at a cost (keeping track of holes to allow fast VA allocation)
15:39 danvet: jekstrand, I'm always succeeding in purging the details from my brain, but why exactly is this something the kernel can do which userspace cannot?
15:39 bbrezillon: the drm_mm struct is definitely bigger than what we need for a purely userspace-based VA allocator
15:39 danvet: like the only one I remember is making a dma_fence container thing out of a memory fence
15:39 danvet: and neitehr the kernel nor userspace really can do that
15:40 danvet: (and yes it hangs around in code, it's awkward)
15:40 jekstrand: dj-death: Yes
15:40 dj-death: jekstrand: arg...
15:40 jekstrand: dj-death: And, yeah, ANV might get that wrong. :-(
15:40 danvet: bbrezillon, yeah my take is to just have an absolute minimal drm_vma and drm_vm struct with emphasis on locking
15:40 danvet: which drmm_mm explicitly leaves to users
15:40 dj-death: yeah I mean it's only reason the max length for storage image descriptor is 128 bytes ;)
15:41 danvet: so you could use drmm_mm internally, but it's not really a fit
15:41 jekstrand: dj-death: For SKL+, we can work around it pretty easily, though. I've just not typed the 3 patches.
15:42 jekstrand: danvet: Vulkan has a future fence. ish. I'm trying to kill it at every opportunity.
15:42 dj-death: jekstrand: I though you did use the lowered surface in some cases
15:42 danvet: yeah I have vague memories of a shadow :-P
15:42 bbrezillon: danvet: I'm perfectly fine with drm_vma being a brand new object, it's just that I started with an hybrid VA allocation model, and drm_mm was convenient for that
15:42 danvet: yeah for hybrid drm_mm is convenient
15:43 danvet: maybe we should just use it internally and let drivers open-code if they want hybrid
15:43 jekstrand: dj-death: On SKL+, if you do a typed read from a surface with a format that doesn't support reads, you get R8, R16, R32, or RG32 data as appropriate.
15:43 javierm: tzimmermann: weird, commit 622113b9f11f ("drm/ssd130x: Replace simple display helpers with the atomic helpers") broke the ssd13x-spi driver but no ssd13x-i2c (where I tested when originally posting that)
15:43 danvet: maybe for some transitional stuff, but otoh I'm not sure anyone needs that
15:43 jekstrand: dj-death: On BDW and earlier, it'll hang. SKL falls back to a nice raw read for you.
15:43 danvet: xe is new and I haven't seen amdgpu really need much for their kernel-picked address mode either
15:43 javierm: tzimmermann: the changei s in the core drivers/gpu/drm/solomon/ssd130x.c driver, shared by the I2C and SPI drivers... and it just uses a regmap. I wonder what could make the difference
15:44 dj-death: jekstrand: okay, well BDW is another driver :)
15:44 tzimmermann: javierm, one sec
15:44 jekstrand: dj-death: Right now, though, the lowering code tries to be more clever with that about formats so when it dies the format conversion, it might assume RGBA8_UINT or something because it's less code in the shader.
15:44 jekstrand: dj-death: Yes. :)
15:44 jekstrand: The lowering code, however, is common.
15:45 jekstrand: danvet, bbrezillon: I see no reason why kernel VA picking and VMA management need to be the same thing.
15:46 jekstrand: You can always strap a VA picker on the front like we're effectively doing by putting it in userspace.
15:46 danvet: yeah worst case we have a drm_mm on the side or something :-)
15:46 jekstrand: If keeping that data adds cost to every driver that uses the common thing, I say torch it.
15:46 jekstrand: Yeah, that's what I meant
15:46 bbrezillon: danvet: oh, I'm definitely not trying to convince you to use drm_mm, if there's common bits to deal with VA allocation in mesa, I'd gladly use it
15:47 danvet: I don't worry about the memory usage, but just the potential fun if we let drivers come up with "interesting" hybrid models
15:47 danvet: it's a can of worms
15:47 danvet: so some strong guidance of the helpers to not do that sounds like a good idea
15:47 bbrezillon: and for kernel allocations, I guess I can use drm_mm and reserve the high VA range
15:47 danvet: bbrezillon, yeah for the kernel address space or whatever you can just whack a drm_mm on top of it
15:48 danvet: if you dont have a separate vm for kernel stuff (funny gpu design tbh) then the kernel/userspace gpu vm split becomes uapi anyway
15:48 tzimmermann: javierm, backlight and SSD130X_DISPLAY_ON moved from crtc to encoder
15:48 danvet: so maintaining a drm_mm just for the kernel portion should be fine
15:49 tzimmermann: could this be related?
15:50 bbrezillon: danvet: nope, no dedicated VM for kernel stuff on mali
15:50 javierm: tzimmermann: hmm, don't think so since otherwise the I2C panel would also be affected
15:50 jekstrand: What do you mean by "interesting hybrid models"? Some kernel-assigned and some userspace-assigned?
15:50 jekstrand: Yeah, let's very much not do that.
15:50 javierm: tzimmermann: it's very strange. I guess will need to dig deeper on what's going on
15:51 jekstrand: Which is to say that I think Asahi and IMG both need that but we can handle it by userspace giving the kernel a range on boot or the kernel having a device query which is "here are my ranges, don't touch them."
15:51 bbrezillon: there's one for the FW, but you still have to map kernel allocated BOs to the user VM (like ring buffers, since the whole thing is designed with userspace scheduling in mind)
15:51 jekstrand: Not a range on boot. On device creation.
15:51 jekstrand: Before any VMs are created.
15:51 jekstrand: Or as part of VM creation.
15:52 jekstrand: bbrezillon: Ok, then we need userspace to give the kernel a range somehow.
15:52 bbrezillon: yep, and I assumed the regular user/kernel split we have for CPU
15:52 jekstrand: Oh, top bit set/not?
15:52 bbrezillon: yep
15:52 bbrezillon: that's assuming the GPU can supports 48-bit VA
15:53 jekstrand: Yeah, that's a reasonable convention assuming both are 48-bit VA.
15:53 bbrezillon: but I honestly hope all new GPUs do
15:53 jekstrand:looks at the stupid 36-bit Intel GPU.
15:53 tzimmermann: javierm, i don't see much of a difference
15:53 bbrezillon: jekstrand: let me rephrase that, all new Mali GPUs do :D
15:53 javierm: tzimmermann: yeah, me neither. Is strange really
15:54 jekstrand: bbrezillon: Yeah, and Mali isn't likely to have to deal with more than 48 CPU-side any time soon.
15:54 bbrezillon: they expose the VA range through a feature reg, so in theory, I guess that's still configurable
15:55 jekstrand: My biggest concern with doing it all by convention is if it's going to cause problems with SVM down the line.
15:55 javierm: tzimmermann: but no worries, just in case it could ring a bell to you. I just noticed because booted a rpi4 that had a SPI panel connected but will take a look this weekend or smt
15:55 bbrezillon: jekstrand: but SVM is still subject to the regular kernel/userspace VA split, isn't it?
15:56 jekstrand: On anything that can be plugged into a PCIe slot, you have to deal with 56-bit (I think) on some of the bigger Xeons.
15:56 jekstrand: bbrezillon: Yes, but ^^
15:56 bbrezillon: (not my concern then :P)
15:56 jekstrand: bbrezillon: Also, if the kernel convention ever changes (seems pretty unlikely), that would break everything.
15:56 jekstrand: Is that considered immutable uAPI by now?
15:56 bbrezillon: I think that would break a lot of things anyway
15:57 jekstrand: Yeah
15:57 jekstrand: In that case, it's probably a fine choice for Mali.
15:57 jekstrand: Asahi and IMG I think are more restricted.
15:57 bbrezillon: but, let's say we want to make it configurable, it shouldn't be such a big deal
15:57 jekstrand: But, still, they can hand in some reserved VA ranges on VM creation and call it a day.
15:57 bbrezillon: as long as the kernel is left a reasonable amond of VA space
15:58 bbrezillon: > But, still, they can hand in some reserved VA ranges on VM creation and call it a day
15:58 bbrezillon: exactly
15:59 bbrezillon: I don't really mind doing that for mali as well
15:59 bbrezillon: it's just a matter of passing extra args to the VM_CREATE ioctl
16:18 bbrezillon: danvet, jekstrand: on to my next question now. Is the dma_resv attached to the VM needed if don't support async bind/unbinding?
16:29 bbrezillon: and I'm also not sure I understand why we need to add the job out_fence to external BOs (xe_vm_fence_all_extobjs()). I thought we wanted to be have explicit fencing everywhere, and when interacting with actors relying on implicit fencing, use the DMA_BUF_IOCTL_{IMPORT,EXPORT}_SYNC_FILE ioctls.
16:29 danvet: bbrezillon, there's 4 fences in the dma_resv, 2 for the kernel and 2 for implicit sync
16:30 danvet: the 2 for the kernel are needed for the shrinker to know what's up and stuff like that
16:30 danvet: so without shrinker you don't need it
16:30 danvet: but if you want eventually a shrinker, you need that and also the per-vm dma_resv trick for good fastpath
16:30 danvet: that 2+2 thing was the point behind könig's huge refactoring for dma_resv fence slots
16:34 gawin: kernel guys: is there a deadline for legacy gpu drivers? I'm relatively close to get my AGP platform working (so I could check if drivers are in usable state).
17:04 q66: the legacy gpu drivers have long been useless anyway
17:05 q66: they are dri1 drivers, which means modesetting is done by the xorg ddx
17:06 q66: the xorg ddx is therefore going to work even without the kernel driver (but you won't get acceleration, but for that you'd also need the mesa counterpart which was dropped in mesa 7.7 ages ago and current mesa can't even load the .so's)
17:06 bbrezillon: danvet: hm, ok. I need to dig further to understand how the xe shrinker works, and what it'd look like in pancsf
17:12 soreau: I was able to run wayfire on legacy r300 driver on RV350 some months ago before the legacy drivers were spilt, haven't tried with amber branch yet though
17:13 soreau: almost typed 'lagacy', ha
17:14 gawin: q66: this sounds right, once I was trying to get 3d driver for voodoo running, but without success
17:20 MrCooper: soreau: r300 is a Gallium driver, still alive and well
17:20 soreau: MrCooper: oh, I'm confused then
17:29 dalurka: q66: but they have sentimental value :)
17:35 jekstrand: bbrezillon: Part of the point of the dma_resv in the VM is also so that VM-tied BOs (most of them) don't need to be individually locked and fences extracted when you go to exec. You can just lock the VM dma_resv, grab its fences, and get 95% of your needed synchronization in one go.
17:36 alyssa: more to the point, "legacy drivers" is less about the age of the hardware and more the maintainence / modernity of the code
17:37 alyssa: as far as I'm concerned, Lima is a modern driver
17:37 alyssa: It's Gallium, it's NIR, it does everything expected of a modern driver, it's just as modern as the latest 2022 shiny
17:38 alyssa: Sort of irrelevant that it's from 2008
17:38 jekstrand: Well, one day it'd be nice to require integers...
17:38 alyssa: grumble
17:39 alyssa: for better or worse Lima is shipping on new hw
17:39 alyssa: somehiw
17:39 jekstrand: It's free. That's how.
17:39 alyssa: ;-D
17:39 dottedmag: oh, the company behind Mali is from Norway. This explains the codenames.
17:41 dottedmag: also, indirectly age of hardware matters: no new hw -> no users / no sponsors -> no maintenance
17:41 q66: gawin: i investigated running r128 on powerpc, so yeah, similar boat
17:41 q66: in a way
17:42 q66: though on powerpc the ddx does not actually deal with modesetting as it's used together with fbdev kernel driver (because on non-x86 you don't have vbe, which the ddx relies on otherwise)
17:42 q66: it's expected that the display is already set up by fbdev
17:42 q66: you can use this path on x86 too if you explicitly activate it
17:45 q66: alyssa: doesn't that also preclude some old hardware by definition though
17:46 q66: wouldn't a gallium driver require a gpu that is capable at least of some degree of programmable shader stuff
17:47 q66: which would mean at least opengl 2.0 capability
17:49 MrCooper: indeed it does, which ~20 years old HW such as R300 satisfies
17:50 q66: yeah but r128 or voodoo would not
17:50 q66: they are fixed function hardware
17:51 q66: r300 is radeon 9500+ iirc? so that's about the minimum
17:51 gawin: maybe r200/nv20 could be also used, but without something similar to gallium nine for d3d8 it wouldn't make much sense
17:51 gawin: (d3d8 also has shaders)
17:52 q66: r200 will support pixel shaders but not fragment shaders
17:52 q66: i think you need both
17:52 MrCooper: R200 had more or less fully functional vertex shaders only, no pixel shaders
17:52 karolherbst: use nv20 for what?
17:53 gawin: d3d8, but that's it
17:53 karolherbst: heh
17:53 MrCooper: that'd be Gallium eight then? :P
17:53 karolherbst: given that the nv20 driver was classic ...
17:54 q66: r200 is quite broken nowadays in practice too
17:54 karolherbst: yeah.. and even nv30 doesn't really work
17:54 q66: but then so is r300 to a large degree
17:54 karolherbst: I suspect everything before having _sane_ shaders doesn't really work :)
17:54 karolherbst: for nv that would be pre nv50
17:55 q66: nv40 sorta works
17:55 karolherbst: mhhh
17:55 q66: unless you have a big endian system
17:55 q66: then it sorta works but worse
17:55 karolherbst: I mean.. we have been fixing a few bugs
17:56 karolherbst: but yeah.. probably good enough for xfce
17:56 q66: almost not
17:57 q66: I'd expect nv50 to be busted to almost the same degree but can't actually say
17:57 q66: i only have an nv40, in a powermac g5
17:57 karolherbst: nv50 actually works
17:58 karolherbst: compared to nv30 that is
17:58 q66: and I only ever have it do anything when booting the machine so that it actually boots (because the firmware expects an openfirmware-compatible gpu to be present)
17:58 q66: once kernel comes up it switches to r5 235 in another pcie slot
19:09 agd5f: gawin, I think the "sahders" on r200 supported like 8 instructions. Not really programmable per se
19:09 airlied: bbrezillon, danvet, jekstrand : please read the mailing list
19:10 airlied: generic gpuva mgr code posted a few days ago and is already being discussed
19:10 airlied: as part of the nouveau work
19:11 agd5f: danvet, bbrezillon yeah, windows guys are always complaining about vidmem (MS memory manager). We do pretty ridiculous things in the driver and hardware to deal with it
19:14 danvet: airlied, oops I'm behind a bit :-)
19:15 airlied: danvet: it uses drm mm bit first req was to move away
19:15 jekstrand: airlied: right. I should go read that
19:19 jekstrand: dcbaker: Got syn to pull in. Had to tweak your meson wrappers a bit, though.
19:36 jenatali: Uh... VK CTS seems to be ignoring the quadOperationsInAllStages bit?
19:37 jekstrand: That's possible
19:37 jenatali: Now do I turn off quads, or do I turn off subgroups in VS/GS?
19:44 zmike: turn off pc
19:50 DavidHeidelberg[m]: zmike:
19:50 DavidHeidelberg[m]: next piglit uprev is few minutes
19:51 zmike: hooray
19:51 DavidHeidelberg[m]: zmike: what change do you want? Also you may drop the change you do from Marge
19:51 zmike: the piglit MR I just landed
19:52 DavidHeidelberg[m]: ok, I'll uprev the piglit by one commit more
19:54 DavidHeidelberg[m]: I dropped your Mesa MR, I'll sent it with upreved piglit together.
19:54 zmike: k
19:54 zmike: ty
19:54 DavidHeidelberg[m]: np
20:10 DemiMarie: What does Intel PXP actually do, and is it useful for anything besides DRM?
20:12 zmike: DavidHeidelberg[m]: just to double check I meant this one https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/759
20:13 DavidHeidelberg[m]: zmike: yes, already in my pipeline
20:13 DavidHeidelberg[m]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20788
20:13 zmike: 👍
21:06 DavidHeidelberg[m]: Fishing for ACKs. It's ci-fairy uprev, wget -> curl migration and piglit uprev. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20788 So it's "huge MR", on other hand it tries to solve failures at HTTP 500 and similar errors, so it "should" improve reliability of CI these days.
21:44 jekstrand:now has a proc macro!
21:44 jekstrand: dcbaker: ^^
21:46 DavidHeidelberg[m]: I'm pushing to Marge, if anyone has serious objections speak now or keep silent until next uprev :P
21:47 dcbaker: jekstrand: not surprised that you needed tweaks
22:42 alyssa: jenatali: "Branch management is getting a bit unwieldly"
22:42 alyssa: now you know why I have agx/next and panfrost/next :-p
22:42 jenatali: Yeah
22:43 alyssa: panfrost/next is utterly boring
22:43 alyssa: but with a certain magic environment variable, agx/next advertises GL3.1 ...
22:44 jenatali: At some point I'm looking forward to getting back to GL. The D3D bits for 4.3 support have landed, it just needs to be hooked up
22:44 jenatali: And most of the hard work for up through 4.6 is probably done now thanks to VK
22:45 alyssa: i'm stuck at GL3.1 on both my drivers because geometry shaders T_T
22:46 jenatali: I'm so close to VK1.1, just a couple more things to finish up now that multiview and subgroups are done
22:58 alyssa: Woo :D
23:11 HdkR: alyssa: Luckily once you implement GS for one, you will know all the problems for the other driver :P