01:07robclark: alyssa: I think I have some a2xx device *somewhere* but probably not something that can run upstream kernel.. or get working in any reasonable amount of time even if I had a reasonable amount of time.. lumag has some imx5 things with a2xx working.. and possible flto does as well.. I suppose I can look at things and suggest things but wouldn't be too easy for hands on debugging
01:09lumag: robclark, alyssa: yep, I have a200 running on imx53, but I'm just diving into mesa part of it.
01:28gfxstrand: Does anyone know: Do the pixmark tests use ARB programs?
01:31gfxstrand: Never mind. I can't be bothered. I'll just put a CI patch on top of the MR.
01:39lumag: robclark, alyssa So, I can test the patchset, but it will probably take a long time for me to write it.
01:40lumag: (I likely won't have time for that til next Weekend).
01:58alyssa: karolherbst: by "direct translate", I mean writing the simple patch to translate load_reg and store_reg intrinsics to moves
01:58alyssa: which for codegen should be just as good as what you do now
01:58alyssa: robclark: alright
01:59alyssa: I guess I can do write the a2xx patches with my eyes closed and my hands behind my back
01:59alyssa: and lumag can test and you can review and probably be good enough
02:02lumag: good
02:09robclark: alyssa: let's try that.. I'll try to help you with eyeballs (and hopefully flto can do since he did a2xx nir conversion) and lumag can help with testing (at least until we can get some a2xx CI.. hopefully)
02:09alyssa: :+1:
02:41Company: is there a way to poll() a GLsync or vkFence?
02:42airlied: you can export a vk fence to an fd, VK_KHR_external_fence_fd not sure how it works with poll
02:46Company: interesting
04:20Company: so, for anyone following my "I wonder why my shiny new GTK vulkan renderer is so much slower than the GL renderer" adventure from a few days ago, I have 2 updates:
04:20Company: 1. it turns out on my discrete AMD, the results are quite different and it's holding up pretty well
04:21Company: It's also doing 2000fps (vs 2800 for GL) instead of the 200fps (vs 650) on my Intel
04:22Company: It's also using 100% CPU instead of 50%, because the AMD is so freaking fast that I'm actually CPU bound (and so is the GL renderer)
04:22Company: while on Intel I am GPU bound
04:22airlied: Company: do you thread anything btw
04:22airlied: ?
04:23Company: the GL renderer is not, and is slower than the Vulkan renderer if I turn off gallium's threads
04:23Company: airlied: no
04:23airlied: so threading vulkan is probably one way to get towards GL
04:23airlied: esp if you are CPU bound
04:23Company: there's a bunch of limitations inside GTK I need to fix first
04:24Company: mostly related to interaction with Wayland/X11
04:24Company: (and I feel I need to understand Vulkan/GL better to know how to do threading properly)
04:24Company: aaanyway, I just had a 2nd breakthrough
04:25Company: I replaced the solid color drawing that we do in lots of blaces with a vkCmdClearAttachments()
04:25Company: and my Vulkan fps went from 220 to 360
04:25Company: so yeah, turns out that matters
04:26Company: when we're in fact GPU bound
04:27Company: I suspect if I get that optimized into into the clear color for the BeginRenderPass, there's a bunch more fps in there
04:29Company: if I comment out the vkCmdClearAttachments, I get 530fps
04:30airlied: oh yeah using a render pass clear would probably help
04:30Company: I just wasn't aware I can double my fps that way
04:30Company: so we seem to not be bound by shader complexity, but the amount of pixels we write
04:31Company: actually no
04:32Company: the amount of pixels is the same
04:32Company: so why is ClearAttachments() faster than Draw() if the shader complexity doesn't matter
04:32Company: is that using the blit engine instead of the shader engine or something?
04:33Company: because now I wonder if - especially for larger embedded videos - replacing texture draws with vkCmdBlitImage is worth the effort
04:34airlied: it all depends on the drivers
04:34airlied: but if you are doing a blit, using blit image should get the optimal path
04:35Company: yeah, guess I can only figure it out by trying
04:36Company: the problem is that those blits happen halfway through the renderpass and there's no vkCmdBlitAttachments()
04:37airlied: on radv you should hit fast clear paths on both clear image and subpass
04:37airlied: but i'm a bit rusty on the details
04:38Company: radv is not my primary concern, because that's usually a fast discrete GPU
04:39Company: my concern is internal gpus (read: Intel) and ultimately mobile
04:39Sachiel: intel will fast clear too, if the conditions are met
04:39airlied: mobile you definitely want render passes
04:42Company: the thing I'd ultimately want is (a) me being more of an expert on GPUs and (b) RPi and Librem users being happy, because those are the ones that use GTK
04:42Company: and I'd like them to not stay on GTK3 and having to use Cairo
06:02Company: I think medium-term the best thing I can do is split the render region by opaque subregions and render each individually
06:03Company: ie split the shadow from the window contents, because then the window contents are opaque and solid, so I can use the clear color there
06:03Company: and then again do the same with the content area and the headerbar
06:04Company: so that the content area doesn't do the clear with the window color, but with the content area's color
06:04Company: alternatively, if that works, I could split into tiles that much the tiling of the GPU and go from there
06:05Company: but no clue how tiling GPUs work
06:05Company: but I read somewhere that webrender does that
07:23MrCooper: Company: maybe GTK should use Wayland sub-surfaces for drop shadows? E.g. can't really use 10 bpc surface formats otherwise
07:24Company: ugh
07:25MrCooper: :)
07:25Company: that has multiple issues, starting with the complexity of doing it, ending with how to do it for rounded corners
07:25Company: input handling etc
07:27Company: it's much easier to just do multiple renderpasses
07:27Company: until I emit command buffers that are too big so that that starts impacting performance
07:28Company: I don't think I've encountered that yet
07:29Company: batching individual vkCmdDraw() calls into multi-instance calls didn't make a difference at least
07:29emersion: render passes won't help with the fact that 10bpc formats don't have a proper alpha channel
07:30Company: that's true
07:30emersion: so, drop shadows can't be painted on a 10bpc buffer, no matter what
07:30Company: though I don't think GTK wants to use 10bpc formats anyway
07:30emersion: no color management?
07:31Company: the goal is float16 for that
07:31emersion: right
07:31emersion: that'd make things easier, although consume more memory and bandwidth
07:31Company: I don't think that will matter much
07:31emersion: i haven't really tested how bad is fp16 compared to 10bpc
07:32Company: because we need the float16 internally anyway for compositing
07:32Company: well, no idea actually, I didn't try either
07:32Company: but current GTK already switches to float16
07:32Company: and nobody has complained yet
07:33Company: if you load a 16bit png
07:34Company: well, nobody is not quite true
07:34Company: there was an AMD bug and when people scrolled their app list in gnome settings suddenly their screen got corrupted
07:34Company: because one or two appsaccidentally had 16bit icons
07:35Company: and GTK hapily switched to make those beautiful
07:41MrCooper: beautifully downsampled to 8 bpc in the end :)
07:42Company: it was good to know that compositors (or at least mutter) handled it fine
07:59linkmauve: Hi, I build Mesa with -Dglx=disabled, which produces no libGL.so, but the symbols are still available when loaded through EGL, is there a way to completely disable GL support to leave only GLESv2?
08:00HdkR: linkmauve: -Dopengl=false in meson?
08:23pq: sima, dottedmag, I would expect DIRTY_FB to be very useful with USB (literally USB and not USB-C DP mode) and paravirtualized display drivers.
08:26emersion: pq, are you thinking of FB_DAMAGE_CLIPS maybe?
08:37pq: emersion, maybe I am, I need to check what DIRTYFB was then
08:38pq: is DIRTYFB not the legacy UAPI version of FB_DAMAGE_CLIPS?
08:41emersion: drm_atomic_helper_dirtyfb() seems to implement DIRTYFB via FB_DAMAGE_CLIPS
08:41javierm: pq: yes, DRM_IOCTL_MODE_DIRTYFB is the legacy one
08:41emersion: however, i don't think (?) atomic supports front-buffer rendering?
08:41pq: I
08:42pq: I'm not thinking about front-buffer rendering at all.
08:42emersion: right, but daniel said DIRTYFB was used for front-buffer rendering
08:42pq: I thought you could mark damage with flip in the legacy UAPI too, but maybe you can't?
08:42pq: drm_mode_fb_dirty_cmd does contain fb_id, so it seems like it could work
08:43javierm: pq: you can but there are some corner cases that don't work. For example, you can't combine DRM_IOCTL_MODE_DIRTYFB with DRM_MODE_PAGE_FLIP_ASYNC
08:43emersion: it seems like drm_atomic_helper_dirtyfb() performs an atomic commit with only FB_DAMAGE_CLIP filled
08:43pq: ok
08:44javierm: pq: I tried to add support for damage clipping to mutter's legacy KMS code path and ran into issues: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2979
08:44pq: emersion, sounds broken, does it not take the fb_id into account?
08:44emersion: it uses the fb_id only to find the right plane
08:44pq: and if the fb_id is not on any plane right now?
08:45emersion: it commits nothing
08:45emersion: rather, commits with no change
08:45pq: I am thinking of: plane has FB1; DIRTYFB FB2; PageFlip plane to FB2
08:45emersion: right, that sounds like a no-op with that helper
08:46pq: unless PageFlip to atomic converter looks at the FB damage rects?
08:46emersion: with that helper, the DIRTYFB region is lost
08:47emersion: if no plane is using the FB
08:47sima: pq, dirtyfb is _only_ for the frontbuffer rendering flush, nothing else
08:47sima: if you page flip you want the FB_DAMAGE one to minimize upload
08:47pq: emersion, that would explain why javierm did not get it to work.
08:48sima: for usb/spi/i2c panels/outputs but also for integrated dsi panels that have selective upload
08:48emersion: okay, that makes sense
08:48sima: plus also all the virtual displays tend to benefit too (since they need to copy)
08:48emersion: sima, so does atomic support front-buffer rendering?
08:48sima: internally dirtyfb is implemented for atomic drivers as a pageflip of the frontbuffer using the damage prop
08:48sima: emersion, sure
08:48pq: sima, even with legacy UAPI? How do you use FB_DAMAGE_CLIP with legacy?
08:49emersion: and the way to do it is via a commit which updates FB_DAMAGE_CLIPS and nothing else?
08:49sima: you could essentially emulate dirtyfb with a atomic flip with same fb_id but damage rect (without damage rect it's an implied full damage) and whatever else you want to change
08:49emersion: sima, but without damage rect, my atomic commit is empty…
08:49sima: pq, what kind of legacy do you mean? like legacy (non-atomic) driver, or using the legacy SETCRTC ioctl, or just the legacy DIRTYFB ioctl?
08:50sima: emersion, hm it shouldn't be ...
08:50sima: maybe you need to ask for an event to trigger the upload
08:50pq: sima, using non-atomic UAPI with PageFlip
08:50linkmauve: “10:00:21 HdkR> linkmauve: -Dopengl=false in meson?”, that disables all GL, GLESv1_CM, GLESv2, while I still want that third one.
08:50emersion: maybe i can set the FB to the same ID as the previous one?
08:50sima: pq, if you use both page flip and dirtyfb legacy ioctl you're very confused ...
08:50pq: sima, I mean non-atomic userspace, not driver
08:51sima: emersion, yeah you need to set something to include the crtc in the commit so that the event flag actually does something
08:51emersion: right
08:51daniels: pq: drmModeDirtyFB is literally that 'here's a dirty region for the same buffer' ioctl for legacy userspace
08:51sima: pq, yeah still, pageflip is for background rendering and flipping
08:51pq: sima, why would I be? I just want the flip to have damage rects. So how would I use FB_DAMAGE_CLIPS?
08:51sima: dirtyfb is for frontbuffer rendering, single buffer
08:51sima: if you mix them, your compositor is confused
08:51emersion: tbh would've been nice to have a "plz send page-flip event" CRTC prop
08:51emersion: instead of a global flag
08:52sima: pq, you can't flip with damage rects with legacy ioctl, you can only use atomic for that
08:52pq: sima, I'm not talking about a compositor but a random old KMS app.
08:52daniels: pq: you can't attach a damage rect to a different-buffer flip with legacy userspace
08:52sima: legacy pageflip ioctl is always implied full damage
08:52daniels: I think there's some kind of new API that fixes this, not sure tho
08:52sima: and since legacy isn't atomic you can't change both the fb_id and damage prop together
08:52sima: daniels, atomic
08:52pq: sima, ok, so non-atomic flipping + USB display = sadness, got it.
08:52sima: it's for atomically changing more than one prop :-)
08:53sima: pq, yup
08:53pq: daniels, why is the dirty_cmd carrying fb_id then?
08:53sima: pq, because uapi design lols
08:53pq: :-P
08:53daniels: sima: oh is that what it's called!
08:53sima: it really should be the crtc, but we screwed up (I think because Xorg)
08:54emersion: daniels, you should really take some KMS lessons
08:54sima: pq, so the kernel actually has to do some really stupid stuff to figure out the right crtc, which means we need to take more locks than the atomic ioctl equivalent
08:54sima: which is why you should atomic ioctl even if you do frontbuffer rendering (if available)
08:54sima: because that's less stupid uapi
08:54pq: looks like I'm not the only one mislead by how that old UAPI looks like
08:55sima: pq, yeah maybe some docs would be good for this one ...
08:55sima: pq, looking at drm_atomic_helper_dirtyfb we try to be clever and unlock the locks we don't need again
08:56sima: but we still have to iterate over all planes and temporarily lock them because the fb_id is just the wrong parameter for damage rect upload
08:56pq: sorry for the noise :-)
08:57sima: pq, so yeah summary: fb_id in dirtyfb is only used to find the crtc for this frontbuffer (all of the crtc if you clone like Xorg does), it's a no-op for a backbuffer
08:58sima: and then it does a damage_clip prop set on that plane/crtc combo
08:58sima: oh s/crtc/plane really, but dirtyfb goes back when crtc implied primary plane so some old drivers only flushed the primary plane and nothing else
09:00emersion: sima, if i involve a CRTC in a page-flip commit without updating any plane, the driver assumes all planes have full damage?
09:01emersion: or do i need to involve the planes in the atomic commit for that?
09:02emersion: by "involve", i mean including a no-op property change, e.g. CRTC_ID or FB_ID to the same value as the previous one
09:04sima: I'm trying to find the code, because I thought yes
09:04sima: but it's been a few years ...
09:06sima: emersion, ok so the implied damage is only for the planes you're adding
09:06sima: so if you do a pure crtc only commit, then nothing gets updated
09:06sima: if you add a plane (even if it's just with the same fb_id) that plane gets an implied full damage
09:06sima: unless you set the damage prop, then it's limited to that
09:07sima: other planes don't automatically get any implied damage so that the legacy ioctl (which are all incremental) dont result in unecessary uploads
09:07sima: e.g. if you have an overlay that you update with the legacy setplane ioctl
09:07sima: don't want to update the entire fullscreen desktop frontbuffer too
09:07sima: or if you move/change the cursor
09:08sima: emersion, if it's a modeset then the helpers by default add all planes, but that's really just convenience for drivers since most drivers want to fully disable/restore planes in that case
09:09sima: (e.g. runtime pm tends to shred register state, and hw tends to get pissed if you shut down the crtc with planes still running)
09:09sima: but that's really an implementation detail of "modesets tend to require full upload"
09:10sima: note actual modeset from the driver pov, not just setting the ALLOW_MODESET flag, that does nothing wrt what the kernel actually commits to hw
09:19pq: sima, implied full damage; oh gosh, even more special cases to the "kernel ignores no-op changes" doc.
09:20pq: it's starting to sound like the MODE_ID example in the doc is actually a special case, not the rule
09:21pq: I'm starting to see the confusion some kernel driver devs have about setting an already set value in a commit
09:21pq: maybe you should rewrite the doc about the no-op handling
09:22sima: pq, hm ...
09:22pq: I can't know all the details
09:22sima: well the kernel tries to no-op out expensive stuff
09:22sima: but the minimal update (especially when you ask for an event) is a page flip, which takes a vblank
09:23pq: from you I just now understood that e.g. if a plane has blend mode X, and I submit an atomic commit that has nothing but sets the blend mode to X again, it will imply full damage.
09:23sima: so for these there's kinda a lower bound of how much no-op optimizations you can do, since you still have the vblank delay anyway
09:23sima: pq, hm
09:24sima: pq, that's maybe not the intention, but maybe the effect
09:24pq: yeeeah
09:24sima: pq, yeah we'd need to handle that in the code, so that only if fb_id was touched we have the implied full damage maybe?
09:25sima: since it's all shared code it's a handful of lines in two places only
09:25pq: sounds better to tme
09:25eric_engestrom: linkmauve: I think HdkR is right, `-D opengl=false -D gles1=disabled -D gles2=enabled` should give you what you want; doesn't it?
09:25sima: and then document that touching fb_id means full implied damage
09:25sima: pq, the problem is a bit implementation cost, since we need to make sure that really nothing else was touched
09:26sima: like if you do change blend mode, we do need that full implied damage
09:26sima: so ... seems a bit brittle to me from correctness pov
09:26pq: right
09:26pq: I'm still not sure if setting the same FB_ID again should imply full damage, but either way, the result is that userspace should always set FB_ID and FB_DAMAGE_CLIPS together to get an explicit result.
09:27sima: otoh the intersection of "plane with blending modes" and "plane that cares about damage in that way" is zero
09:27sima: pq, yeah that's best
09:27sima: like if your userspace tracks damage, always set it
09:27sima: otherwise the kernel needs to make some defensive assumptions
09:27pq: right, but the blending mode was just a random example for any KMS property, not blending mode specifically - just something not FB_ID
09:28sima: hm if the driver doesn't catch blend changes as full damage for dsi manual upload then it's buggy anyway
09:28sima: so maybe this wouldn't be brittle
09:29sima: atm we don't have a helper for damage for manual upload where checking everything else matters, no one refactored that out from drivers yet
09:29sima: pq, so I think we could clarify this to "only fb_id results in implied full damage" if that's an ask from compositors
09:53Company: so, final verdict of the last 5 hours of experiments: making sure that the window background is drawn with vkCmdClearAttachments() and even adding workarounds to draw the rounded parts with shaders (I now hate designers and in particular Apple again), gave me 350fps instead of 220fps on my GPU-bound Intel
09:54Company: and my CPU-bound AMD desktop keeps its 1900 fps but the shader clock according to radeontop is now at 1.7GHz from 2.1Ghz before
09:58linkmauve: Company, reminds me circa GTK 4.2 when I discovered how round shaders were the reason GTK was horribly slow on Lima.
09:59linkmauve: And libadwaita uses them everywhere…
10:05sima: emersion, feel like typing up some of the conclusions from this chat into uapi docs? I guess for dirtyfb and some pointers/links to the damage prop
10:05sima: not sure how/whether to tie it into pq's atomic uapi doc ...
10:05sima: maybe a note that currently we have a lot of implied damage and so compositors should always set the damage prop if they track it?
10:15pq: sima, as a Wayland compositor dev, I'm partial to favor Waylandish semantics: nothing implies any damage, but the FB in use can read at any time and on any part, so all of it must always have good contents. I understand that KMS needs some implied damage to keep old stuff working.
10:15pq: *can be read
10:16sima: "fb can be read at any time" we already have
10:16linkmauve: eric_engestrom, HdkR, you were both right, it was the “ (all versions)” part which confused me in meson_options.txt.
10:16sima: the other part I'm not exactly sure what you mean with "implies any damage"?
10:18pq: sima, damage is a request to KMS to read. Without damage, KMS can assume that things did not change so there is no client-side reason to read and KMS can read only the minimum absolutely necessary.
10:18pq: sima, it only makes sense outside of the continuous re-scanning of FB.
10:18sima: yeah I mean all of this only makes sense if you don't have continuous rescanning
10:19eric_engestrom: linkmauve: ah I see, indeed "Build support for OpenGL (all versions)" can be interpreted as also including GLES. I'm not sure how to phrase it better? "Desktop OpenGL (all versions)" maybe?
10:20sima: pq, and yeah for backwards compat we have to assume for at least some cases that "no explicit damage" means "implied full damage"
10:20sima: instead of what I guess wayland does, which is "no damage means no damage"
10:20sima: pq, ^^ that correct?
10:21pq: correct
10:21linkmauve: eric_engestrom, that would be better yes, but why does it mention versions here? AFAIK there is no mechanism to disable OpenGL versions at compile-time.
10:21linkmauve: Only some environment variable to make the application believe it has a specific version, and possibly some driconf as well.
10:22pq: I see that FB_DAMAGE_CLIPS already specs that the whole FB must have valid contents regardless, so that's taken care of too.
10:23eric_engestrom: linkmauve: fair point, I don't recall building only part of desktop gl every being a thing; maybe it was worded like that because the two other options are "OpenGL ES 1.x" and "OpenGL ES 2.x and 3.x", so they wanted to be clear that for desktop gl the option covered all versions without this split?
10:24eric_engestrom: maybe removing "(all versions)" is best
10:26pq: eric_engestrom, while you're at it, could I ask to maybe improve the doc for "gallium off-screen rendering" as well? I would not have guessed it referred of OSmesa kind of things, and was puzzled why it was off when surfaceless platform was enabled.
10:30pq: and yeah, I too have always assume that disabling "OpenGL (all versions)" would disable all GL ES too.
10:32eric_engestrom: linkmauve: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24113
10:32eric_engestrom: pq: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24114
10:36pq: thank you!
10:41penguin42: karolherbst: I through in a merge request for my profiling code in the early hours, it seems to give believable numbers from my smoke test
10:48Company: linkmauve: does lima have Vulkan?
10:49linkmauve: No it doesn’t, only GLES 2.0 and some kind of emulated GL 2.1.
10:50Company: I'm wondering what the best mobile GPU is for testing Vulkan rendering
10:51Company: like, in free drivers support, in sanity and in different features
10:51linkmauve: Best is relative, but the most mature one is probably turnip.
10:52linkmauve: It implements all of Vulkan 1.3 at least.
10:52linkmauve: (On recent hardware.)
10:53Company: I'm wondering about getting something I can easily throw GTK at to get a feel for how it performs
10:54Company: which kinda means installing Fedora
10:54linkmauve: https://mesamatrix.net/ might help you pick a driver.
10:54Company: so I guess RPi4?
10:55linkmauve: Company, rpi4 is v3dv, which is the only other driver having reached Vulkan 1.0.
10:55linkmauve: panvk and pvr haven’t yet, and nvk didn’t run last I tried it.
10:55linkmauve: Maybe I could try again. :)
10:56Company: tu even does descriptor_indexing
10:56Company: oh, so does llvmpipe since 2 weeks ago
10:56linkmauve: lavapipe* :)
11:05pq: is there anything similar to mesamatrix that maps driver names to actual hardware?
11:06pq: even just chip names would help
11:15emersion: eh, i missed all the fun
11:16emersion: sima, yeah, i'll try to type some docs…
11:21jannau: pq: https://docs.mesa3d.org/systems.html and follow the links
11:21linkmauve: pq, in the past the wiki was used for that, but it isn’t necessarily up to date, and each driver has different page layouts making it kinda hard to figure them out.
11:22jannau: I'm not aware of anything more concise
11:22emersion: pq, do you have a specific vendor in mind?
11:23emersion: ah, no, since it's about driver names
11:23emersion: (AMD has a hardware info page, for instance)
11:24emersion: (https://www.x.org/wiki/RadeonFeature/)
11:27pq: emersion, the most recent tangible question was "which driver would my Haswell need?"
11:28pq: so I just built both crocus and iris, as I couldn't tell which, and only the documentation of the amber brach (which I do not want to use) told me what to use instead of i965
11:29pq: that link to the Intel website is utterly unhelpful
11:31pq: I could of course just build everything, check which one actually gets used, and then trim the build config
11:32kode54: You need hasvk
11:32kode54: Oh
11:32pq: I was thinking of GL ES 2
11:32kode54: Opengl
11:32kode54: Don’t think they split Iris
11:32pq: but yeah, I don't think the vulkan driver set is much clearer either
11:33pq: what's crocus then?
11:33kode54: Not sure
11:33pq: :-)
11:36emersion: pq, https://wiki.archlinux.org/title/OpenGL#Installation
11:36emersion: it would be quite nice to have this in the mesa docs…
11:36emersion: there is https://docs.mesa3d.org/systems.html
11:37emersion: ah, already linked
11:37emersion: (but yeah, not very useful as is)
11:38pq: ah, Arch docs to the rescue - again. It seems like Arch always has the docs for anything related to administrating a pet Linux box. :-)
11:38daniels: pq: in short, hsw needs crocus
11:41pq: I see, thanks.
12:26alyssa: pq: doc patches welcome
12:33Company: linkmauve: so I guess I want to get me some snapdragon minipc to get a mobile GPU - but Google says I probably shouldn't do that because installing Fedora on those things is more likely to fail than succeed?
12:35linkmauve: I don’t know about Fedora, but I never had any issue running ArchLinuxARM on my Dragonboard 410c (which doesn’t do Vulkan).
12:36linkmauve: Better ask turnip people for which hardware would be best for your usecase, for instance robclark.
12:36alyssa: a6xx
12:49Company: yeah, seems like it needs a6xx or a7xx GPUs, but I have exactly zero clue where to get some device with that
12:50robclark: Company: yeah, defn recommend something a6xx.. there are some chromebooks which are inexpensive and have had upstream kernel support since the beginning.. or for something faster the thinkpad x13s is pretty nice (and my current daily driver).. you can get fedora running on either, but currently takes a bit of work (chromebooks because of the different boot flow.. the windows snapdragon laptops are a bit more normal in that
12:50robclark: regard.. I'm using systemd-boot on the x13s). I think all the essentials landed in v6.5 merge window so possibly we'll see fedora installer support on the x13s soon.
12:50Company: I ideally want something I can plug into a wall, install Fedora on, and connect via ssh whenever I want to test stuff
12:51robclark: the aarch64 .raw.xz image is useful for "manually" installing on things which fedora .iso image doesn't support.. just create loop device and dd over the root/home partition
12:51linkmauve: Although, if your target is Raspberry Pi and Linux phones, getting a high end Snapdragon might not be the best thing.
12:51Company: robclark: those things are expensive
12:52linkmauve: Have a look at the PinePhone Pro for instance, it will use panfrost/panvk and be a normal Linux phone.
12:52robclark: yeah.. but also having 32GB of RAM and an nvme is nice
12:52robclark: but for less expensive, chromebook is a good option
12:53Company: I just want to test-drive GTK on it, so cheap is good
12:53kisak: oh hey, mesa 23.2-branchpoint is scheduled for today.
12:54Company: otoh, if it comes with all the hardware and the hardware works, that's nice, too
12:54javierm: Company: I've the X2 Chromebook and mainline (and Fedora support) is pretty good. There are other fedora devs like eballetbo and pbrobinson that also use it so we can assit with the installation :)
12:54_jannau__: how close is MS' windows dev kit 2023 to the thinkpad x13s?
12:55javierm: Company: I documented a few months ago the steps I followed to install Fedora on my X2: https://blog.dowhile0.org/2022/11/04/how-to-install-fedora-on-an-hp-x2-chromebook/
12:55javierm: zmike told me that this process currently leads to a blank screen, need to dig to figure out what regressed
12:55dschuermann: is there any objections against https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22339 ? I would really like to land it
12:56robclark: _jannau__: it would really just need someone to put together devicetree for it.. it's come up a few times on #aarch64-laptops but not sure if anyone has made any progress yet.. pretty sure someone at least dumped the acpi tables
12:57robclark: but pretty much all of the actual driver work for x13s would apply equally for the dev kit
12:57Company: that devkit is $200 new, so that's a lot less than all the laptops
12:58Company: but I take it that that's not plug and play yet
12:58_jannau__: where? I think it should be $600 but comes with 32GB ram and nvme as well
12:59Company: I was looking at the ECS LIVA Mini Box QC710
12:59Company: which Google claimed was 219 new when it was still sold
13:01_jannau__: I ment this https://www.microsoft.com/en-us/d/windows-dev-kit-2023/94k0p67w7581
13:02Company: that's an 8c, mine's a 7c
13:03Company: the x2 chromebook should be a 7c, too, which is why I ended up at that mini box
13:03robclark: not sure if anyone is looking at the 7c dev kit
13:03robclark: there is someone looking at a 7c win laptop.. but I think that is still WIP
13:03alyssa: _jannau__: eyeing greener pastures :~P
13:03javierm: Company: if you want a QC machine, I would limit to either the X13s or X2 Chromebook
13:03alyssa: ?
13:04javierm: Company: anything else is choose your own adventure with downstream forks and whatnot
13:04robclark: (other 7c chromebooks should be easy too.. they are all pretty similar hw wise)
13:04Company: I think I'll go with "try again in a few months"
13:04javierm: robclark: right, I should had say 7c chromebooks rather than X2
13:05Company: or convince RH to buy me something
13:06Company: $200 is okay for something to play with $500+ is a bit much if all I do is compile GTK on it every 3 months
13:06javierm: Company: if you just need an aarch64 machine with a well supported GPU, then you can just go with a cheaper board with Mali as other suggested
13:07javierm: for example the rockpro64 that has a rk3399 SoC and is well supported in mainline and Fedora
13:07alyssa: no Vulkan though
13:07javierm: alyssa: right
13:07alyssa: GL3.1 and GLES3.1, not conformant on anything
13:07alyssa: s/anything/either/
13:07alyssa: though GLES3.1 is "close"
13:07Company: yeah, I ideally want something with Vulkan
13:08alyssa: RK3399 remains my personal laptop (with the M1 for work).. soft spot in my heart
13:08Company: because then I can experiment with the hardware
13:08alyssa: Company: for arm64 + mature Vulkan, one of the recommended a6xx machines is the way to go I think
13:08javierm: Company: another option is one of the boards with a TI SoC since there's a DRM and vulkan drivers in the works
13:08alyssa: also not mature though
13:09alyssa: turnip is years ahead
13:09mripard: RaspberryPi + an AMD GPU? :P
13:09Company: mripard: I want a mobile gpu
13:09alyssa: won't you have PCIe trouble too?
13:10mripard: well, it will be "mobile"
13:10Company: my idea was to have 3 kinds of gpus: discrete, integrated and tiling
13:10mripard: alyssa: yeah PCIe is going to be limiting
13:12alyssa: ok, still need to figure out who's going to convert broadcom and lima
13:13linkmauve: alyssa, what needs converting?
13:13emersion: Company: multi-GPU? :D
13:13emersion: also split render/split is good to have
13:13emersion: split render/display*
13:13alyssa: linkmauve: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9051
13:13emersion: and display-only like displaylink
13:13emersion: hm, not for GTK i suppose
13:14Company: GTK cares a bit about switching GPUs
13:14Company: that's why I bought my desktop CPU with an embedded GPU
13:14alyssa: Company: you must be perfect for social gatherings,
13:14Company: which I promptly deactivated in the bios
13:14alyssa: for when people want to have Company over
13:14alyssa: =P
13:14Company: alyssa: that's where the nick comes from!
13:14alyssa: =D
13:14alyssa: I love fun nick stories ;D
13:15alyssa: So it's not Company & Company? :=p
13:15Company: was a co-op account, and when my regular nick wasn't available on battle.net anymore in 1999, I took this one
13:15alyssa: =)
13:22Company: with lavapipe, my Vulkan renderer that does 350fps on Vulkan does 15.5fps
13:22Company: also, no visual bugs that I can see
13:23emersion: s/Vulkan/real hw/ i assume?
13:23Company: that's nice because it means CI can soon run the Vulkan tests
13:23zamundaaa[m]: Company: if you care about switching between GPUs, does that mean that GTK will also eventually gain support for surviving GPU resets?
13:23Company: emersion: my Intel TigerLake
13:23emersion: lavapipe is still Vulkan ;)
13:24Company: yeah, there was a lot of "Vulkan" in that sentence
13:25Company: zamundaaa[m]: GPU resets shouldn't be that much of a problem because you can switch GTK renderers at runtime
13:26Company: zamundaaa[m]: but nobody looked at it, because people's GPUs don't typically reset
13:26emersion: you should try amd :P
13:27Company: the thing GTK can't and doesn't want to deal with is compositor resets
13:27linkmauve: Or lima.
13:27zamundaaa[m]: Company: they definitely do reset, it's (sadly) not that uncommon
13:27Company: emersion: my desktop seems to still be working :p
13:28kchibisov: it would be nice to handle gpu resets if only compositors wouldn't die along the resets...
13:28kchibisov:had to change amd gpu to intel due to amd having reset/hard reset every 4-8 hours.
13:28Company: GTK's renderers should be fine with restarting, the only issue we have is that our GdkTexture objects have no way to recover
13:29Company: and they're meant to be immutable
13:29zamundaaa[m]: kchibisov: You just have to use the right compositor ;)
13:29Company: but that's a special case for when we get external textures
13:30Company: like GStreamer or spice
13:30kchibisov: Do you mean kwin, zamundaaa[m] ? I want to test handling of GPU resets in my client.
13:30kchibisov: In general I do wonder how should I test it, I was made aware of AMDGPU option to trigger reset, but my compositor dies with that.
13:31Company: kchibisov: how?
13:31kchibisov: Company: how to trigger a reset?
13:31Company: just so I know how to do it should I ever want to
13:31Company: yeah
13:32zamundaaa[m]: kchibisov: yes. KWin 5.27 is mostly robust against GPU resets (in real world terms it handles them fine), and git master shouldn't crash even in worst case situations
13:32kchibisov: zamundaaa[m]: do you know a setup where I could test all of that? I have kwin only in qemu...
13:32zamundaaa[m]: kchibisov: Note that with AMD you need to have https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290, or it won't recover
13:33zamundaaa[m]: Company: try `sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover`
13:34DemiMarie: <marcan> "the issue here was that X died..." <- Explicitly refuse to support Xorg. X11 desktops can run in full-screen rootful Xwayland.
13:34emersion: sima: what happens when PAGE_FLIP_EVENT is used with PAGE_FLIP_ASYNC?
13:35emersion: the event is delivered immediately?
13:35sima: when the buffer is being scanned out, which is a bit hw dependent how quickly that happens
13:35sima: I'm not exactly sure what happens to the timestamp/frame number you get
13:36zamundaaa[m]: kchibisov: either install it on your system, or make a separate partition. Live boot may work too, if you restart KWin after installing Mesa from git
13:36kchibisov: I mean, I just don't want to pull it on my system ¯\_(ツ)_/¯. I hoped that it's sort of possible to do inside the qemu just fine.
13:37zamundaaa[m]: If you can do GPU passthrough, that should work too
13:37kchibisov: Do you happen to know whether it's possible to trigger reset on intel hardware?
13:38kchibisov: I don't have amdgpu around anymore.
13:38sima: emersion, honestly no sure what happens to these, but the event should go out as soon as the old buffer stops being scanned out
13:38zamundaaa[m]: I don't know of one, no. Even running an app with a shader that hangs didn't do a full reset on Intel, it just killed the app that was hanging
13:39emersion:writes a patch with "When used with PAGE_FLIP_ASYNC, ¯\_(ツ)_/¯"
13:40kchibisov: zamundaaa[m]: yeah, I've never had a reset on intel hardware, the worst I've seen it could glitch for a short while.
13:40sima: emersion, yeah maybe best, and then cc amd/intel/nouveau/vc4 folks
13:41sima: since those are the drivers that do something with it
13:41linkmauve: Speaking of GPU resets, on my phone sometimes lima dies due to a timeout, and my compositor gets all confused and starts alternating the last frame and the half-rendered frame until I restart it; would implementing KHR_GL_robustness in lima be enough to let it know it lost the context, and would have to recreate it?
13:41linkmauve: Or is it not even a context loss?
13:41Company: yup, works
13:41linkmauve: [drm:lima_sched_timedout_job] *ERROR* lima job timeout
13:41linkmauve: lima 1c40000.gpu: pp task error 0 int_state=0 status=5
13:41linkmauve: lima 1c40000.gpu: pp task error 1 int_state=0 status=5
13:41Company: kinda evil that cat'ing a file resets stuff
13:41linkmauve: ↑ this is what I get in dmesg.
13:41sima: emersion, if I'll guess you simply get the vblank timestamp and frame number for the previous vblank
13:41kchibisov: linkmauve: oh, are you on pinephone?
13:41emersion: yeah that's be my guess as well
13:41linkmauve: Yes I am.
13:41kchibisov: I think I have the same issue on it with lima..
13:42zamundaaa[m]: linkmauve: that does look like a GPU reset to me. Note that afaik KWin is currently the only Wayland compositor that handles GPU resets at all
13:43MrCooper: Company: it's a debugfs file (accessible to root only), any debugfs file can have any kind of side effects
13:44sima: emersion, maybe an igt sheds some light? or MrCooper?
13:44linkmauve: kchibisov, I’ll try to figure out why GL_ARB_robustness gets exposed in a GL context, but not GL_KHR_robustness in a GLES context.
13:44linkmauve: But I’d like to find a way to reproduce that bug at will.
13:44sima: iirc MrCooper implemented page_flip_async first, so whatever is in there is the uapi :-)
13:45Company: MrCooper: still, i'd have expected echo 1 > file for something like it
13:45MrCooper: emersion sima: I share your guesses for the values in the events; other than that, the events work exactly the same as without ASYNC
13:45kchibisov: linkmauve: were you using catacomb?
13:45linkmauve: zamundaaa[m], which extension do you use? Does it work on lima?
13:45linkmauve: kchibisov, I was.
13:45enunes: linkmauve: that is likely one of https://gitlab.freedesktop.org/mesa/mesa/-/issues/8410 https://gitlab.freedesktop.org/mesa/mesa/-/issues/8415
13:45Company: MrCooper: what's the status with mutter and gpu resets?
13:45kchibisov: I think the way to trigger weird things was to have alacritty windows around on it and try to slide them.
13:46linkmauve: Weird things also work when I run tzompantli and slide it around.
13:46kchibisov: Yeah, because all of them gles2 apps.
13:46MrCooper: Company: I think there's a vague plan we want to get there eventually, but I don't know of any specific plans or work for it
13:47Company: great - GTK is in-line with that
13:47kchibisov: It's like the way you go fast on the phone and use OpenGL for as much as you can and direct scanout everything.
13:47linkmauve: https://linkmauve.fr/files/glitch.png
13:48kchibisov: Yeah, I've seen that.
13:48linkmauve: kchibisov, here is one instance where the crash happened right at the same time as the glitch.
13:48linkmauve: kchibisov, catacomb isn’t using the GPU for this though I think, it’s using a plane.
13:49kchibisov: yeah, it's all on the planes.
13:49kchibisov: it just barely never composes itself, because every client is using dmabuf and on the phone you have at max 3 clients visible at once.
13:49jenatali: Ugh, the addition of blake3 broke our arm64ec build, that's frustrating
13:50kchibisov: And you have 3 planes on pinephone which you could really use for that.
13:54zamundaaa[m]: linkmauve: EGL_EXT_create_context_robustness for creating a robust context, and either GL_ARB_robustness or GL_EXT_robustness (whichever is available) to check for GPU resets
13:56zamundaaa[m]: GL_KHR_robustness does not seem to be supported on lima
13:56linkmauve: Neither of the three are. :/
14:16zmike: eric_engestrom / DavidHeidelberg[m]: another ci rules change I'd like to see is having gallivm changes only trigger llvmpipe/lavapipe-based jobs
14:17zmike: I tried that in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13439 / https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16191 but maybe I was a bit too aggressive
14:25eric_engestrom: zmike: this looks non-trivial, writing it down for later
14:25zmike: it's a real struggle to make any gallivm changes atm because it triggers every possible driver job
14:25zmike: when it's already been proven that this isn't necessary
14:26eric_engestrom: indeed
14:26zmike: mesa/st uses draw module for some things, which uses gallivm, but llvmpipe jobs would fail anyway if any other driver would break from such changes
14:27eric_engestrom: I think my refactor !24099 will make doing this kind of changes easier to do, if you want to help it land ;)
14:28eric_engestrom: (I know it's big, but the first half is small easy to review commits, and the second half is moving code in blocks with no changes, so not too hard to review either)
14:41alyssa: robclark: I don't suppose there's a2xx drm-shim?
14:42robclark: yes, same drm-shim as other gens should work..
14:42alyssa: oh nice
14:42alyssa: that makes things a lot easier!
14:43alyssa: ok, I have shader-db going. this is some *weird* assembly.
15:02alyssa: run: ../src/gallium/drivers/freedreno/a2xx/ir2.c:342: sched_next: Assertion `instr_v || instr_s' failed.
15:02alyssa: gesundheit
15:05alyssa: got it
15:08robclark: yeah, an instruction has to be either scalar or vector
15:10alyssa: robclark: that was pretty easy actually :~)
15:10alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24118/diffs?commit_id=db276b6049ed68af090d2aab6f29d54ecbee145d
15:11alyssa: lumag: Please test that MR. Thanks!
15:12alyssa: ir3, you're up
15:16lumag: alyssa, ack, will do that either today, or tomorrow.
15:18alyssa: thanks!
15:34enunes: alyssa: hi, about the lima backend fixes for the nir register rework, is it expected to be a change as small as this last one? maybe I can do that if it's not a huge amount of work
15:51alyssa: enunes: should be small yet
15:51alyssa: s/yet/yeah/
15:51alyssa: the lima/pp change will look like that a2xx change there
15:52alyssa: lima/gp will be a bit different, since there you don't want to fold in the registers into instructions at all, you just want to translate load_reg/store_reg to GPIR instructions directly
15:52alyssa: (possibly net negative LOC from the gpir change)
16:01enunes: alyssa: right, I can give it a shot, any hint on how to iterate and check that it is good? should I locally delete struct nir_register or nir_src or something and work until it works satisfying that?
16:02enunes: or do you have a branch I should pull with the end goal removing everything we want to get rid of
16:03alyssa: enunes: Everything you need for gpir landed upstream last night
16:03alyssa: for ppir, everything except https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116/diffs?commit_id=0d8d5af83e037a5115095b2e36e59d15937e8a4b which you should cherrypick I gues
16:04alyssa: for ppir, I would start by changing which NIR passes you call (see the a2xx change as a model, ppir will be the same), and then everything will break because suddenly your compiler will see new intrinsics that weren't there before
16:04alyssa: then rework your nir_src/nir_alu_src/nir_dest/nir_alu_dest -> ppir translation to call into nir_legacy instead so you can ignore the new intriniscs
16:05alyssa: and likewise set abs/neg/sat based on the legacy_alu_src/dest and ignore fabs/fneg/fsat when nir_legacy tells you to
16:05alyssa: nir_foreach_register becomes nir_foreach_reg_decl
16:05alyssa: I expect the ppir patch to look the same as the a2xx. and if it passes deqp you're fine.
16:06alyssa: for gpir, change your NIR passes as above .. but do not call any "trivialize" pass
16:06alyssa: again you'll have piles of crashes (around deqp-gles2 control flow tests I assume) due to new intrinsics
16:06alyssa: ignore decl_reg, implement load_reg/store_reg as moves to/from registers, and drop your support for translating non-SSA sources/dests
16:06alyssa: gpir doesn't use abs/neg/sat modifiers so nothing to do there
16:07alyssa: and again if it passes deqp, you should be fine
16:07alyssa: Happy to review what you come up with
16:10alyssa: gerddie: !23089 landed, so once you open an MR for the r600 changes I'll review that :)
16:11enunes: alyssa: cool thanks for the tips, I'll look at it over the next days
16:11alyssa: thanks!
16:11alyssa: main reason I haven't bothered to go deleting stuff is because it'd be absolutely terrible to rebase
17:20penguin42: oh, i915_get_query_result is funny; '*result = 512 * 1024 * 1024;' always that result, whatever :-)
18:24DemiMarie: emersion: from my PoV C is mostly useful for legacy reasons (compatibility with C APIs, wide platform support). If one wants a systems programming language with a good spec that is Ada, not C.
18:24DemiMarie: What is the situation with Vulkan support on e.g. Panfrost?
18:24emersion: DemiMarie: to each their own, i'm not biting again
18:27DemiMarie: emersion: fair
18:42alyssa: enunes: if you can take of care of lima/gp at least, that one scares me ;)
19:01DemiMarie: I just saw the device memory TCP patchset. Feels like NetGPU all over again.
19:52airlied: alyssa: why did you think nir_builder_create was a good idea?
19:52airlied: isn't it doing a stack copy of that object now always?
19:54airlied: it's like a 40 byte copy vs an 8-byte, or does it always get inlined?
19:54jenatali: Assuming the C compilers apply the same optimizations they do to C++ code, you get NRVO and it's constructed in place in the caller's frame
19:55airlied: ah cool
20:01pcercuei: It looks dangerous IMHO
20:02airlied: I also assume at O0 it totally goes slow
20:02pcercuei: Ah nevermind - I thought it was returning a pointer to a stack variable. I read the code wrong
20:22alyssa: airlied: that was motivated by ergonomics, not performance
20:22alyssa: but I wouldn't expect a performance change given it should be inlined
20:23alyssa: I didn't benchmark it but it didn't occur to me it would make a difference
20:23airlied: alyssa: tbh I think that is a personal ergonomic preference
20:23alyssa: airlied: the real win is for nir_builder_at
20:24airlied: for a long time C programmer trained to spot struct copies that reads wrong,
20:24alyssa: if it's a problem we can turn it into a macro instead of a static inline
20:24airlied: so is the opposite of ergonomic, it's downright uncomfortable
20:24alyssa:tilts head
20:25alyssa: It's not supposed to even be doing anything.. it's just supposed to desugar to an initializer.
20:25airlied: you learn the code patterns that look wrong over years, just because the compiler deals with it, doesn't mean you don't squirm everytime :)
20:25alyssa: If not for C++, I would even suggest #define nir_builder_create(impl) (nir_builder){.impl = impl, .shader = impl->function->shader}
20:25alyssa: (If that works in C++ we can go ahead and do that.)
20:26airlied: it being inline makes all the difference, just don't see that immediately from reading uses of it
20:26alyssa: ok..
20:26alyssa: I don't know what to say. It didn't occur to me this would be a problematic change and I don't recall this coming up during review.
20:27jenatali: alyssa: That'd work in C++20
20:27jenatali: Assuming that .impl is before .shader in the struct because C++ implies ordering requirements
20:27alyssa: jenatali: aren't we stuck at $OLD_VERSION
20:27jenatali: We default to C++17
20:27jenatali: I don't know that we strictly require it in common code though
20:29airlied: well I think the thing is to be wary of large struct copies if returning structs from things, if that later was to become non-inline for some reason it wouldn't be fun, but otherwise carry on :-)
20:29alyssa: I'm not really sure what you want me to say/do
20:30airlied: nothing :-)
20:30alyssa: okie dokie!
20:30airlied: NRVO saves us all
20:31alyssa: jenatali: If the C++ versions allow, I *would* like that function to be re-expressed as a designated initializer return, but that's a different issue.
20:31alyssa: but also I have actual work to do :~)
20:31jenatali: Yeah, eventually we should start using C++20 for more designated initializers
20:33HdkR: <3 designated initializers
20:37alyssa: same
20:38HdkR: Apparently clang has a bug where `-wmissing-field-initializers` doesn't work with missing designated initializers. womp womp