01:00AndrewR: it seems "st/mesa: Drop the TGSI paths for PBOs and use nir-to-tgsi if needed." broke piglit/bin/pbo-teximage on nv50 ....
01:05imirkin: anholt: --^
01:34AndrewR: issued .. https://gitlab.freedesktop.org/mesa/mesa/-/issues/3680 while I think it may be some ininitialized memory somewhere? Running NV50_PROG_USE_NIR=1 bin/pbo-teximage few times gives different results each time ...(still fail, just numbers and color area in windo are different)
02:17anholt: AndrewR: can you get some before/after dump of the tgsi? (any NIR_PRINT=1 too, probably)
02:22mareko: AndrewR: !6946 depends on my st/mesa commits recently merged
02:32AndrewR: moment, have some RL things to do ....
03:00AndrewR: anholt, nir print resulted in some 700kb file (uncompressed). Ok to attach to issue ?
03:01anholt: maybe NIR_TO_TGSI_DEBUG=1 would be better
03:02AndrewR: NIR_TO_TGSI_DEBUG=1 doesn't print anything with those three commits reverted ....
03:08anholt: sure. you should have your tgsi dump according to whatever the debug flag is for your driver enabled, to get tgsi output
03:52AndrewR: "../src/gallium/frontends/va/surface.c:487:28: error: use of undeclared identifier 'VA_RT_FORMAT_YUV420_10'" ...owwww..
07:55AndrewR: so, apparently my libva just too old, so it only has VA_RT_FORMAT_YUV420_10BPP vs VA_RT_FORMAT_YUV420_10. I wonder if #define right there in mesa will do job ....
13:14nashpa: hello, I was trying to push a patch in drm-misc-next with dim, I've got a conflict on drm-intel/drm-intel-gt-next and I'm not sure how to fix it
13:40danvet_: nashpa, was it your patch that's causing the conflict?
13:41danvet_: nashpa, rebuilds just fine here, I guess you raced
13:41ickle: no, I was in the middle of resolving; took time to recompile
13:41danvet_: yeah that's what I meant, raced with someone else
13:42ickle: took time to answer as well
14:33nashpa: no, my patch was on drm-misc-next for the komeda driver
14:53robclark: danvet_: I think you broke patchwork somehow.. https://patchwork.freedesktop.org/series/82927/
14:56danvet_: robclark, yeah I broke it real bad :-/
14:56danvet_: resent as a new thread
15:01vsyrjala: X-Patchwork-Hint: comment
15:02danvet_: vsyrjala, nah, it was a misplaced git send-email
15:02danvet_: I forgot the -1
15:02danvet_: and dumped the entire wip pile
15:02danvet_: except the one patch I wanted to resend
15:05bnieuwenhuizen: danvet_: question about getfb(2) and uabi regressions.
15:06bnieuwenhuizen: doesn't converting framebuffers to modifiers evn if they were created without not break GETFB?
15:06bnieuwenhuizen: since GETFB fails with > 1 planes
15:08vsyrjala: danvet_: ah. i think you also confused it earlier with a diff looking reply though
15:09danvet_: bnieuwenhuizen, I'd call that a bugfix
15:09danvet_: since getfb has no idea whether it's a dumb userspace or something fancy
15:10bnieuwenhuizen: danvet_: but prior it worked fine?
15:10danvet_: maybe getfb should flat out fail if there's a modifier
15:10danvet_: yeah ... :-/
15:10bnieuwenhuizen: like non-modifier X11 does non-modifier single plane pageflip
15:11bnieuwenhuizen: and obs studio does getfb -> and is able to import in egl/libva
15:11danvet_: otoh no one complained about this for intel yet
15:11danvet_: and we have all the fastboot you'd want
15:11bnieuwenhuizen: I think the difference is that intel never exposed compression iwthout modifiers
15:11danvet_: amdgpu isn't remotely close with flicker avoidance
15:11danvet_: so I think in practice we'll get away with it
15:11danvet_: or maybe not
15:11bnieuwenhuizen: with AMd we have the problem that we have a single layout that is 1 plane without modifiers and 2 with
15:11danvet_: obs uses getfb for screencapture?
15:12bnieuwenhuizen: and it is not only about flicker but ffmpeg/ and an obs branch can capture from kms
15:12bnieuwenhuizen: found this branch which is getting rebased and seems used by people: https://github.com/w23/obs-studio/tree/linux-libdrm-grab
15:12danvet_: why do we allow all these ioctls without any access checks
15:12jekstrand: It's more fun that way!
15:13bnieuwenhuizen: and ffmpeg does getfb2 but luckily only with libva which isn't very strict about some extra planes
15:13jekstrand:has no context for that statement. :P
15:13danvet_: I tried to retire the vblank one
15:13danvet_: now I'm just sad
15:13bnieuwenhuizen: but getfb2 + import into non-modifier EGL2 also fails :(
15:13bnieuwenhuizen: since there is a strict plane count check there
15:14vsyrjala: what do they even do with the getfb? it doesn't return the gem obj handle for just any random client
15:14danvet_: bnieuwenhuizen, the problem is it still goes all totally boom
15:14danvet_: even if we'd preserve the original creation arguments
15:14bnieuwenhuizen: but at this point I only see 3 solutions for AMD: (1) declare this not a regression (2) expose 1 plane + DRM_FORMAT_MOD_INVALID in getfb/getfb2 (3) always use a single plane for AMD modifiers even with compression
15:14bnieuwenhuizen: why would it go boom?
15:14danvet_: since if you run modifiered mesa with obs
15:15danvet_: obs the does getfb, gets modifiers, falls over
15:15bnieuwenhuizen: right, but at that point I'd say it is the fault of new mesa using new features?
15:15danvet_: it's now a mesa regression, but that doesn't really change the tune of the sad thrombones
15:15danvet_: so if you want to avoid unhappy users, it's the same problem
15:16danvet_: for just the kernel upgrade we'd probably get away with it since niche enough use case :-P
15:16danvet_: I think this is no-win
15:16vsyrjala: do people run obs as root?
15:16danvet_: we could try to convert as much of the modifiered-fb back into implicit
15:16bnieuwenhuizen: just wanted a paper trail for not redesigning all the modifiers back to single plane
15:17bnieuwenhuizen: the ffmpeg kmsgrab is run as root yes
15:17bnieuwenhuizen: haven't tried the obs thing
15:17danvet_: but that would then break the explicit modifier use-cases
15:17danvet_: since we have no flag to say "I can modifier" for getfb1/2
15:17danvet_: adding that flag would break the stuff that already works
15:17danvet_: further fragementing everything
15:17bnieuwenhuizen: what flag would you need?
15:17danvet_: bnieuwenhuizen, it has to be root only
15:18danvet_: bnieuwenhuizen, a flag that allows getfb1/2 to set modifiers
15:18MrCooper: bnieuwenhuizen danvet_: FWIW, Xorg drivers also use GetFB for seamless startup
15:18danvet_: plus kernel infra to recompute modifiers
15:18danvet_: MrCooper, yeah but you're not going to get something tiled from the boot splash
15:18bnieuwenhuizen: MrCooper: but pretty much always from linear where this is not really an issue
15:18danvet_: dumb gives you flat
15:18MrCooper: I wouldn't worry about capturing apps though, capturing directly from KMS can't work reliably while there's a separate display server running
15:18bnieuwenhuizen: like this is mostly about DCC
15:18bnieuwenhuizen: what do you mean?
15:19bnieuwenhuizen: ffmpeg kmsgrab worked pretty well when I tried
15:19danvet_: bnieuwenhuizen, it's unsynced
15:19danvet_: and that's unfixeable
15:19MrCooper: it can't synchronize to display server flips
15:19bnieuwenhuizen: ah yes
15:19danvet_: you might capture an GlClear'ed buffer
15:20MrCooper: danvet_ bnieuwenhuizen: right, but in theory the DRM master before Xorg could be anything?
15:20danvet_: MrCooper, yeah, in practice that drops the bug report rate by a few order of magnitude
15:20danvet_: since multi-user is much less
15:21danvet_: bnieuwenhuizen, I'm surprised it works really
15:21danvet_: since in -modesetting we disable any multi-plane formats
15:21danvet_: since compression tends to make the unsynced access even more fun
15:21bnieuwenhuizen: well, the thing where it happens is non-modifier X (e.g. xf86-video-amdgpu) + fullscreen app (where we have explicit sync)
15:21danvet_: because of unsynced frontbuffer rendering
15:22bnieuwenhuizen: and then a non-modifier modeset happens
15:22bnieuwenhuizen: it is just "what do we do about the getfb2" for that case
15:22danvet_: bnieuwenhuizen, do you even get a compressed buffer to scanout without modifiers right now?
15:23danvet_: and ffmpeg captures that without hilarity?
15:23danvet_: like a fullscreen game?
15:23bnieuwenhuizen: fullscreen glxgears tested
15:23bnieuwenhuizen: I think implicit sync helps a lot in the GPU render -> GPU capture case
15:24robclark: danvet_: doesn't userspace doing getfb2 imply it grok's modifiers (or at least the concept).. that was kinda the point of getfb2
15:24danvet_: robclark, apparently not
15:24danvet_: could also be "I want yuv planes"
15:24robclark: (for the record, there is a screenshot tool that a lot of cros tests use that uses getfb2 + import into egl
15:24MrCooper: bnieuwenhuizen: even so, it may end up capturing the next frame drawn to a buffer, which would result in the video intermittently going backwards
15:24danvet_: bnieuwenhuizen, test a real game
15:25robclark: danvet_: well, getfb2 was introduced *after* modifiers
15:25danvet_: glxgears is never going to get you a split cs
15:25bnieuwenhuizen: so obs uses getfb, ffmpeg uses getfb2 but doesn't use the modifier since libva import doesn't take the modifier and the chromeos screenshot utility does egl import (without modifier if EGL does not expose the ext)
15:25danvet_: yeah libva not even having a proper modifier interface is another sad story
15:25danvet_: robclark, as if that stops people
15:26bnieuwenhuizen: so the only thing really regressing is obs (+ other tools using getfb) and the screenshot utility (but we don't care)
15:26danvet_: bnieuwenhuizen, I think best we can do is an amdgpu.ko to disable modifiers
15:26danvet_: and make that only apply to the current list of pciid
15:26robclark: danvet_: sure, but doesn't mean it is our bug
15:26danvet_: can't break future stuff
15:26danvet_: robclark, it's always our bug if a kernel upgrade breaks it
15:26robclark: fwiw, if you want to point people at how to import frames w/ getfb2, https://chromium.googlesource.com/chromiumos/platform2/+/master/screenshot
15:26bnieuwenhuizen: robclark: that also breaks if EGL doesn't do modifiers ....
15:27robclark: so, like ancient mesa or something?
15:27bnieuwenhuizen: danvet_: the alternative is doing modifiers with compression for AMD in a single plane ...
15:27danvet_: or just different mesa
15:27danvet_: or libva
15:27bnieuwenhuizen: not that I like that
15:27bnieuwenhuizen: yeah old mesa
15:27danvet_: bnieuwenhuizen, but that still goes boom
15:28danvet_: since we want to nuke the implicit modifier stuff eventually
15:28danvet_: I hope at least
15:28danvet_: otoh that's future platforms
15:28danvet_: robclark, think mesa shippped with steam != mesa on system
15:28robclark: not super familiar with libva, but can you do kms -> egl -> libva instead of kms -> libva?
15:28danvet_: or some fun like that
15:28danvet_: robclark, I guess for amdgpu this should work, since it's the same library underneath with the gallium state tracker
15:29danvet_: on intel it's a separate thing
15:29bnieuwenhuizen: danvet_: I think the solution to that is just dropping non-modifier modeset at some point for future platforms?
15:29danvet_: bnieuwenhuizen, yup
15:29danvet_: well, just the implicit magic get/set ioctls
15:29robclark:just waiting for someone to get around to adding video support to iris and let the whole separate libva go away ;-)
15:29danvet_: robclark, everyone is waiting for that
15:31bnieuwenhuizen: danvet_: anyway since mesa does not support modifiers yet for AMD I think breaking the non-modifier path is a bit early
15:32danvet_: bnieuwenhuizen, oh sure this is only for when it's all rolled out
15:32danvet_: for intel we waited like 5 years, not recommended to wait that long :-)
15:33danvet_: bnieuwenhuizen, I think for obs/ffmpeg the only thing we can do is a drm module option
15:33danvet_: to disable modifiers
15:33bnieuwenhuizen: the other alternative is eplxicitly setting DRM_FORMAT_MOD_INVALID in getfb(2) and keepiuing num_planes to 1
15:34danvet_: and tell people to boot with that if they mix modifier and modifier-unware kms usespace and expect it to work
15:34bnieuwenhuizen: oh hmm, that doesn't work with modeset
15:34bnieuwenhuizen: and I assume unsetting DRM_MODE_FB_MODIFIERS in getfb2 isn't reasonable?
15:34danvet_: bnieuwenhuizen, you can't decide that eitehr
15:34danvet_: or do you mean behind a module option too?
15:35bnieuwenhuizen: danvet_: nothing accepts modifiers = DRM_FORMAT_MOD_INVALID in the kernel yet as that is the end of the list in enumeration in the drivers
15:35bnieuwenhuizen: why can't you decide that
15:35danvet_: let me type a diff, I think we're talking past each another a bit
15:35bnieuwenhuizen: if DRM_MODE_FB_MODIFIERS isn't set in the modeset, don't set it in getfb2?
15:36danvet_: that breaks userspace which wants to use everything modifier or not modifier
15:36danvet_: like if you have a compositor that doesn't modifier
15:36bnieuwenhuizen: but they already have to deal with that?
15:36danvet_: and then switch to one that does
15:36danvet_: if getfb2 always gives modifiers, that's much nicer
15:37bnieuwenhuizen: it is nicer but the current (pre-modifier) AMD behavior is that it doesn't get set, so not setting it shouldn't break anybody?
15:37danvet_: plus I'm not seeing how that fixes the obs use-case when your compositor does have modifiers
15:37danvet_: bnieuwenhuizen, yeah, but then not enabling modifiers also doesn't break anyone
15:37danvet_: and all we get with this trick is move the regression from kernel to mesa upgrade
15:38danvet_: until someone has new mesa on old kernel
15:38danvet_: and then it's still the kernel upgrade
15:38danvet_: doesn't win
15:39danvet_: then tell people using ffmpeg/obs to set that
15:40danvet_: I don't think we can do better
15:40emersion: some compositors have already an option to disable modifiers
15:40danvet_: maybe ping obs/ffmpeg people to please add modifier aware paths
15:40emersion: wlroots and weston at least
15:40bnieuwenhuizen: well, we should change allow_fb_modifiers otherwise we still have the getfb2 issue :P
15:40bnieuwenhuizen: not just the cap
15:40bnieuwenhuizen: but yes point taken
15:40emersion: i guess if we really wanted to fix this we'd need to add a GETFB2_MODIFIERS client cap
15:41danvet_: bnieuwenhuizen, hm probably just a hack needed for getfb
15:41danvet_: and getfb2
15:41danvet_: and still let the driver work fully with modifiers internally
15:41danvet_: getfb1/2 simply report as if there's no modifier present
15:42danvet_: emersion, yeah but experience says it never happens
15:42danvet_: emersion, plus we'd break the people who are perfectly capable of modifier'ed getfb2 right now
15:42danvet_: like robclark said, getfb2 was added because modifiers
15:42emersion: fwiw i completely agree with you
15:44danvet_: so kernel modparam + making ffmpeg/obs aware that it exist with a plea to pls use it is about as good as we can do I think
15:44danvet_: bnieuwenhuizen, hm do you think we could limit the hack to getfb1/2?
15:44danvet_: to allow modifiers to be used?
15:45danvet_: it would only start falling apart once the implicit stuff is completely gone
15:45MrCooper: danvet_: I'd rather suggest explaining to them why this can't work reliably :)
15:45emersion: ffmpeg kmsgrab can already do getfb2 iirc
15:45bnieuwenhuizen: emersion: just but no modifiers :P
15:46emersion: hrm hrm
15:46danvet_: MrCooper, well thanks to implicit sync it'll work fairly reliable in practice
15:46danvet_: if you recheck that it's the same buffer handle I think it's even guaranteed
15:47danvet_: if you're not extremely slow and the compositor is pageflipping
15:47MrCooper: not really, the client may have already submitted another frame for that buffer
15:47danvet_: MrCooper, we only have a flip depth of 1
15:47MrCooper: in which case the captured video will stutter back and forth
15:47danvet_: and getfb doesn't give you the currently scanned out buffer
15:47danvet_: but the queued one
15:48bnieuwenhuizen: we have a flip depth of >1 for GPU render work though which does count for implicit sync
15:48danvet_: so if you do bufid1 = getfb(); queue_copy(); bufid2 = getfb() and throw the frame away if bufid1 != bufid2
15:48danvet_: then I think this is race free
15:48danvet_: for pageflipping compositor
15:48MrCooper: there's still a race: 1) capturer GetFB 2) DRM master submits flip 3) client submits next frame 4) capturer gets frame contents
15:48bnieuwenhuizen: though in the typical X11/wayland stack we don't start to submit new render work until the next frame is flipped to scanout due to protocol
15:49danvet_: MrCooper, the bufid recheck should catch that
15:49danvet_: not sure obs/ffmpeg does that
15:49bnieuwenhuizen: MrCooper: if you have vsync and the GetFB returns the queue frame you have like 16 ms (or whatever the freq) is to submit work
15:49bnieuwenhuizen: which can technically fail but should be easy to hit
15:49emersion: what if the compositor has 2 swapchain images, and the fbid recheck happens 2 frames after the first one?
15:50MrCooper: just let the display server do the capturing...
15:50bnieuwenhuizen: I think that as long as things are according to vsync any timing issues should be exceedingly rare in practice
15:51danvet_: emersion, yeah if you take more than a frametime to queue the copy and recheck, it'll fail
15:51danvet_: but in practice, you just queue stuff up
15:51danvet_: so in practice, this never fails
15:51bnieuwenhuizen: I think nobody here is arguing this is a particularly preferred way to capture
15:51emersion: maybe a game is running
15:51emersion: and has higher priority than the capture
15:52danvet_: emersion, doesn't matter
15:52danvet_: you might miss frames
15:52danvet_: but the amount of cpu work you need to do is very minimal
15:52emersion: i'm talking about the GPU scheduler
15:52danvet_: you only need to queue the copy to the kernel
15:52danvet_: not actually run anything
15:52danvet_: emersion, implicit sync will make sure your copy gets scheduled before the next frame from the game
15:53emersion: ah, right
15:53bnieuwenhuizen: yeah even with high prio you can't skip implicit sync
15:53danvet_: so you really only have to sneak the command submission in
15:53AndrewR: ...may be making some separate virtual v4l2 capture device from kernel gpu driver is not very bad idea?
15:53danvet_: it'll even avoid the horrible compression artifacts we see for frontbuffer rendering
15:54danvet_: since your copy will fit between 2 render jobs that flush all the caches
15:54danvet_: AndrewR, it's equally bad
15:54danvet_: compositor owns the flip queue, not the kernel
15:54danvet_: kernel just executes what the compositor wants
15:55danvet_: the only exception is if you have something like vkms
15:55danvet_: or writeback
15:55danvet_: vkms is dead slow because it's software
15:56danvet_: and writeback needs hw support, plus the compositor might want to use that too
15:56emersion: hm, what makes vkms/writeback special wrt. flip queue ownership?
15:56danvet_: emersion, we capture the actual output
15:56danvet_: and not a buffer
15:56emersion: ah, yeah
15:56danvet_: maybe if you queue up some in-kernel copies as part of page flips
15:57danvet_: but it all kinda gets nasty
15:57danvet_: and not any better than getfb behind the compositor's back I think
16:07jkqxz: I wrote ffmpeg kmsgrab. I just read the above discussion, and I'm not understanding what additional modifier support would be wanted on the ffmpeg side?
16:08dcbaker[m]: anholt: what was the rust library you wanted to use for option parsing? I'm going to try my hand at writing cargo integration in meson today and wanted something to start with
16:08jkqxz: (Certainly it is horrible on older kernels without GETFB2, but ignoring that.)
16:15danvet_: jkqxz, do you stuff the modifiers you get from getfb2 into libva?
16:15danvet_: or egl
16:17emersion: or vulkan (yikes)
16:17jkqxz: They do go into Vulkan.
16:17danvet_: then it should be all fine
16:17danvet_: maybe bnieuwenhuizen looked at the wrong thing ...
16:17emersion: … except nobody supports the vulkan extension for modifiers
16:18danvet_: bnieuwenhuizen, sounds like that's the mesa solution, you need to make sure you have modifier import support everywhere
16:18emersion: jkqxz: what happens on the EGL and vaapi paths?
16:18jkqxz: I don't think they go into libva, but they probably should. (No driver supported that. Intel was all implicit, AMD didn't have modifiers anyway.)
16:19emersion: libva has no api for modifiers, yet
16:19danvet_: emersion, I thought there was a shoddy one that wasn't good enough
16:20danvet_: it recently came up
16:20jkqxz: We don't have any GL internally, so no EGL. That would be fine, though - players like mpv do the right thing with them.
16:20danvet_: well a few months ago
16:20danvet_: in some internal discussion
16:21emersion: jkqxz: hm, what do you do if neither vulkan nor vaapi is supported?
16:21emersion: hm https://github.com/intel/libva/commit/f2ddc03d0b8f6ba3bb143a086687f1ad386046c6
16:22danvet_: do not ask me how to use that
16:22emersion: yeah that's what i found as well
16:22danvet_: it allows importing
16:22emersion: ah, maybe the issue was with export?
16:22jkqxz: emersion: kmsgrab is not required to connect to anything. The user gets the DRM fds and can do what they like with them.
16:22danvet_: but for getfb2 that's the best you can do anyway, if libva can't read your buffer you're toast
16:22bnieuwenhuizen: jkqxz: stuff like ffmpeg -device /dev/dri/card0 -f kmsgrab -i - -vf 'hwmap=derive_device=vaapi,hwdownload,format=bgr0' -c:v libx264 does import into libva
16:23bnieuwenhuizen: (and not specifying vaapi ends up memory mapping or something which doesn't work with non-linear framebuffers)
16:23danvet_: emersion, I guess mesa libva isn't support that though
16:23emersion: doesn't support what?
16:24jkqxz: emersion: That change was me, because the Intel media people pushed a crazy half-baked interface so I responded with "why don't you just use the same structure as export" and we ended up with that. They then lost interest in importing modifiers and there was no implementation for a while, I don't know what the currnet state is.
16:24emersion: ah, the import modifier
16:24emersion: so, tl;dr is API is here
16:25emersion: but nobody implements it?
16:25emersion: (so like vulkan?)
16:25jkqxz: bnieuwenhuizen: Yes. Or you can use Vulkan or OpenCL (beignet) in the middle instead.
16:25jkqxz: bnieuwenhuizen: And yeah, you can just map directly to download if the fb is linear. That works fine on a lot of ARM SBCs.
16:27danvet_: we're changing that, with modifiers everywhere
16:27danvet_: anyway I guess kernel modparam to not enable modifiers in getfb is probably the best we can do here
16:28danvet_: jkqxz, yeah intel libva dropping the ball on modifiers is pretty embarrassing
16:29jkqxz: emersion: Huh, looks like it actually got added to the new driver a few months ago. <https://github.com/intel/media-driver/commit/6b3fa1fe8900885922d8b9fdffa9f1d1eddbffea>
16:29jkqxz: So maybe usable, I should try that.
16:30emersion: oh, nice!
16:34jkqxz: danvet_: Who would ever set that parameter?
16:34anholt: dcbaker[m]: structopt is a set of macros around clap and it's *so* nice.
16:35danvet_: jkqxz, users when stuff stops working
16:35danvet_: it's not great
16:36emersion: jkqxz: people complaining about screen capture not working
16:38jkqxz: So the idea is you set that parameter, then when someone asks the kernel later what modifiers is supports it says "nope, linear only plz", and GETFB continues to work because the scanout buffer will then never have a modifier?
16:38bnieuwenhuizen: well, pre modifier wasn't always linear ;)
16:38bnieuwenhuizen: but yes
16:38jkqxz: That feels like it will have weird other consequences because the display will have different behaviour.
16:39emersion: no modifier == implicit modifier
16:57jekstrand: Fixed memcpy. Now NIR isn't deleting my entire shader and it goes back to taking forever to compile. :-(
17:57danvet_: robclark, hm what's the deadlock with v4l?
17:57danvet_: at least for dma_resv_lock this should be handled with the ->pin callback at import time
17:59robclark: danvet_: need to drop lock before drm_prime_gem_destroy()
17:59robclark: because dma_buf_detach() tries to grab same lock ;-)
17:59robclark: (deadlock was in free_object path)
18:00danvet_: ah yes that'll give you fairly immediate death :-)
18:00robclark: maybe deadlock is the wrong term.. just double acquiring same lock
18:00danvet_: yeah I was worried for a bit that it's one of the nasty circular locking across subsystem ones
18:02robclark: no, just a silly thing that I didn't see in the dma-buf-export-to-v4l2 case
18:02robclark: but found something that was doing things the other way 'round
18:29jekstrand:might have to implement back-end function support... :-(
18:30bnieuwenhuizen: huh, why?
18:31jekstrand: I'm trying to compile some truly impressive OpenCL kernels. It's like a 10 minute compile time
18:31bnieuwenhuizen: and there is significant inlining redundancy?
18:31jekstrand: Not as impressive as luxmark4, though. These actually complete without OOM. :-)
18:31jekstrand: bnieuwenhuizen: Non-trivial, I think.
18:31jekstrand: bnieuwenhuizen: But I've not measured it
18:32bnieuwenhuizen: oof, OOMs
18:32jekstrand: Yeah, luxmark4 OOMs the system inlining functions. :-)
18:32jekstrand: These aren't that bad
18:48macc24: what's the state of dual-gpu support in mesa?
18:51bnieuwenhuizen: macc24: assuming you only want to render with a single GPU in a given application it can mostly work
18:51agd5f_: macc24, what do you mean? You can select between GPUs on multi-GPU systems if that is what you mean
18:51bnieuwenhuizen: think situations like a laptop with a dgpu and igpu/apu or something like that
18:51bnieuwenhuizen: but if you were thinking crossfire/SLI equivalent, then no
18:52agd5f_: also works with multiple dGPUs
18:52macc24: why no
18:52agd5f_: macc24, crossfire/SLI is not implemented. Lots and lots of work
18:54FLHerne: In principle, you could do alternate-frame rendering or so with weird hacks in the game engine :p
18:55FLHerne: (but this isn't a useful solution to any problem)
18:58bnieuwenhuizen: that typically gets way complicated due to inter-frame rendering dependencies
19:46airlied: jekstrand: i took a brief look at llvmpipe func support
19:46airlied: bit messy to do
19:48airlied: since i have to pass executikn mask through everything
19:48airlied: and move some stack thingsmto global vars
19:48airlied: so intrinsics can work in fns
19:49jekstrand: airlied: Yeah....
19:49jekstrand: airlied: We've got call/return instructions but I don't want to think about the RA implications.
19:50airlied: yeah my other option is add a bunch of stuff as func params
19:50airlied: that every call passes
19:50airlied: and add the actual fn params after
19:51jekstrand: Neither of those sound great TBH
19:51airlied: i also expect finding an inline threshold might be fun
19:51jekstrand: But LLVMpipe is probably the easiest driver to hook this up for....
19:52airlied: since there are definitely some fns we would want to remove
19:52jekstrand: My thought is to basically have a threshold where, once it gets so big, no more inlining.
19:52jekstrand: Unless it's used exactly once. Then you can inline it there.
19:52jekstrand: It'd probably have to be an iterative pass
19:53airlied: ive no idea how to make barriers work
19:53airlied: with llvmpipe coroutine launcher i use
19:53jekstrand: We'd also have to inline everything with barriers
19:53jekstrand: So they end up in the main function
19:53jekstrand: Maybe do that first
19:54airlied: yeah doing barriers a few fns deep seems insane
19:54jekstrand: I have a kernel here with 75K back-end instructions. On the upside, it only spills a handful of times.
19:55airlied: did you design ibc with this in mjnd? :-)
19:55jekstrand: Never mind... it spills like mad. :-/
19:55jekstrand: My spill counting is busted
19:56jekstrand:shakes his fist at the 13035 send instructions
19:58jekstrand: Looks like the OpenCL driver only spills 584 registers for this one. :-/
20:02airlied: i do wonder if a lot of cl kernels in the real world would run faster on cpus :-p
20:05bnieuwenhuizen: how many registers do you spill with corrected counting?
20:05AndrewR: airlied, you always can test this :}
20:08bnieuwenhuizen: assuming you use the register coloring helpers I wonder if we need to add live range splitting?
20:12jekstrand: bnieuwenhuizen: I'm not sure how many registers, but it's about 13k spill/fill instructions
20:14bnieuwenhuizen: but they could all be spilling/filling a common register :)
20:14bnieuwenhuizen: unless you are counting "SSA" registers. Then you can only spill once
20:15jekstrand: bnieuwenhuizen: I don't have good statistics on the stuff being spilled, sadly.
20:15jekstrand: Not yet anyway
20:17jekstrand: There's 137 NIR registers. That's not a good sign....
20:19bnieuwenhuizen: those are arrays and phi webs right? do we do live range checking for those?
20:19jekstrand: Our back-end does live range check them
20:21jekstrand:wonders how much of this garbage is uniform
20:50jekstrand:is tempted to make nir_algebraic work for intrinsics
20:51jekstrand: That sounds like a lot of work. :/
21:19jekstrand:doesn't like how much code he's writing for one "simple" intrinsic optimization. :-/
22:14jekstrand: Grr.... Someone's doing inclusive_scan(read_invocation(x))
22:15jekstrand: So either they mean shuffle and not read_invocation or else that scan is pointless.
22:15jekstrand: I think they mean shuffle