00:38 anholt: mareko: commented on the MR
00:41 mareko: thanks
12:50 MrCooper: karolherbst: the "rusticl: implement CL 3.1 platform features" commit broke the piglit clSetKernelArg test, expected?
12:51 MrCooper: looks like it given the last commit in the MR
13:05 MrCooper: Venemo: "modifiers don't support mip mapping" would imply some kind of conflict between tiling and mipmaps, which I've never heard of; I suspect what was meant might be that there's currently no general API to create a texture with a modifier and mipmaps
13:07 Venemo: MrCooper: are there any mip mapped formats in drm_fourcc?
13:07 Venemo: I think that modifiers are missing some information for mip maps
13:08 Venemo: but then, I don't know much about that. I trust mareko on this
13:08 MrCooper: "mip-mapped formats" isn't a thing, mip-mapping and formats are orthogonal concepts
13:08 zmike: ^ fact
13:10 MrCooper: (I suppose mip mapping can't work with non-filterable formats, e.g. integer ones)
13:10 MrCooper: that's not related to modifiers / tiling though
13:15 daniels: most APIs which consume formats & modifiers only take format/modifier/width/height/stride/fd/offset as params; if you wanted to throw full mip chains around, you'd need either to pack them all into a single memory region and pass the number of levels + surface stride into whatever API consumes that, or have a two-level API where you import each level separately before combining them into a single resource
13:16 mareko: additional data that comes with modifiers like offsets and strides are not enough to describe mipmapping in theory
13:18 MrCooper: not really seeing the difference to tiling without modifiers
13:24 emersion: i'm not too familiar with mipmaps, it's just a regular image with some additional metadata describing where/how the different versions of the image are stored?
13:24 emersion: but it's still using your regular RGBA pixel arrangement?
13:34 MrCooper: looks like the legacy tiling metadata has per-level offsets for GFX8 & older; not for GFX9 & newer though, presumably the offsets are implicitly derived there
13:35 MrCooper: since the per-level offsets can't be communicated with modifiers, there would need to be some kind of convention instead
13:36 mareko: a mipmap is a semi-multi-plane image, the mipmap levels may have offsets or they can be implicitly derived from the base level
13:38 mareko: modifiers only support 2D non-mipmap non-MSAA no Z/S because those features may have other complexities
13:39 MrCooper: can't a simple convention like "smallest possible offset for each mip level" work with modifiers?
13:39 mareko: of course
13:44 Venemo: are there any use cases for using mipmapping with modifiers?
13:44 mareko: no
13:44 daniels: Venemo: I was literally just about to ask why you were even asking
13:44 daniels: MrCooper: AMD and 'smallest possible stride' are not exactly good friends
13:45 karolherbst: MrCooper: yes, piglit is wrong there now
13:46 Venemo: well, then, for gfx6-8 modifiers, can we just leave them as unsupported, for now?
13:47 mareko: they are always unsupported
13:47 Venemo: right
13:50 MrCooper: daniels: see the discussion starting at https://gitlab.freedesktop.org/mesa/mesa/-/work_items/5882 ; some code seemed to be hitting a surface with both a modifier and mip maps, looks like it was a false positive due to DRM_FORMAT_MOD_INVALID/LINEAR though
13:50 Venemo: MrCooper: I already noted that was a bug in the code. it was checking something on images that didn't have modifiers
13:50 linkmauve: A possible usecase could be to display a scale=2 surface on a scale=1 output, in which case the compositor could directly use the level 1 mipmap instead of downscaling the level 0, right?
13:50 MrCooper: mareko: "of course" and "they are unsupported" contradict each other
13:51 linkmauve: Possibly reducing memory bandwidth by 4.
13:52 mareko: MrCooper: no they don't, both are true: 1) modifiers don't support mipmapping, 2) mipmap offsets can be derived from the base level on some GPUs
13:53 MrCooper: linkmauve: one problem being there's no way to communicate how many mip levels there are along with a modifier? maybe that's the real reason for "they are not supported"
13:53 linkmauve: MrCooper, sure, I just wanted to mention a valid reason to eventually someday maybe want them both to be supported for the same texture.
13:54 mareko: some GPUs place mipmap levels on a single 2D plane, so you can have a plane that contains mipmap levels as subimages, somebody would have to analyze mipmap layouts on all GPUs and determine what we can reasonably support and what new extension(s) would need to be created to support mipmapping
13:54 MrCooper: linkmauve: right, seeing as it's been a known issue for at least a couple of decades, I'm not holding my breath :)
13:55 linkmauve: With the added benefit that the client being in control of both mip levels means it could e.g. render twice for optimal rendering (for instance using proper font hinting on both).
13:55 linkmauve: Heh indeed. :)
13:55 linkmauve: Have modifiers been a thing for two decades? :O
13:55 Venemo: linkmauve: that sounds like a lot of overhead for very little, if any, benefit
13:55 MrCooper: it was already an issue before modifiers
13:55 linkmauve: They seemed quite new when I started working on the DRI stack circa 2015.
13:56 MrCooper: not being able to share buffers with mipmaps
13:56 mareko: yes, they are quite new
14:04 karolherbst: So I kinda want to land the ffma rework soonish and I already got a full review, but if anybody else still wants to take a look, please let me know (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165)
14:08 daniels: Venemo: oh, I missed the MR, that's awesome - congrats
14:10 Venemo: thanks daniels :)
14:19 MrCooper: oeuaoeuoeu[m]: is jay fine with the vblank ioctls not working (same as with asahi & nvidia drivers) for virtual / "fake" display drivers, or would it prefer to get the vblank events even with jittery timestamps?
14:50 mahkoh: If the vblank ioctl doesn't work, page flip events will be used instead. That is, I render a frame with the previous damage and submit that to get a page flip event at the next vblank.
14:55 mahkoh: I believe the policy for this is that existing hardware must continue working as is. New hardware on the same driver can stop supporting the ioctl. If someone reported to me that frametimes etc. are worse after updating the kernel, I would report that as a regression.
15:00 MrCooper: "frame with the previous damage" as in the same contents as the previous frame?
15:02 mahkoh: Yes, it will just re-render the contents that are currently being scanned out into a new framebuffer and trigger a flip to that.
15:08 MrCooper: why not extrapolate from the last completion event instead?
15:09 MrCooper: what you describe is guaranteed to miss at least one display refresh cycle
15:11 MrCooper: FWIW, tzimmermann posted a series to make that change, you'd have to speak up there
15:11 mahkoh: They should forward changes affecting compositors to wayland-devel.
15:11 MrCooper: true
15:12 cssushiman: How do page flip events work, exactly? Do they trigger just before the blanking interval, or do they trigger just after the interval starts?
15:13 mahkoh: I'm using vblank events to estimate the time of the next vblank event. I see no reason to estimate multiple frames into the future when the kernel already has a perfectly fine (if software) timer that does exactly that.
15:13 mahkoh: MrCooper: can you post a lore.kernel.org link?
15:14 tzimmermann: mahkoh, sure, i'll send the next update out to wayland-devel. until then: https://lore.kernel.org/dri-devel/b5d03921-1e6f-4c4f-900e-fc9e28222176@mailbox.org/T/#t
15:14 MrCooper: cssushiman: not defined specifically, mostly up to the HW / driver; the timestamp in the event though corresponds to the end of vertical blank
15:16 MrCooper: tzimmermann: I kind of agree with mahkoh that allowing vblank events to work might make sense
15:17 MrCooper: mahkoh: that said, I don't really see what you described buys over extrapolating, when the vblank ioctl doesn't work
15:17 mahkoh: tzimmermann: afaict your series only affects drm_wait_vblank_ioctl and not drm_crtc_queue_sequence_ioctl. Therefore my software would be unaffected.
15:17 tzimmermann: mahkoh, yes it's only about wait_vblank
15:18 MrCooper: hmm, I was assuming it affects both, seems like an artificial distinction
15:18 mahkoh: However, this change looks like an obvious regression to me. If someone wrote a special-purpose compositor for, e.g., some POS hardware, and that compositor hard relied on the ioctl because it has always worked with their specific hardware, then they could no longer upgrade the kernel.
15:18 MrCooper: it's the same functionality really
15:19 cssushiman: MrCooper: Responding to your answer to my question, does that mean that on a given example driver, the event could fire at the beginning of the interval, with the timestamp given always being placed at the end of the interval?
15:19 MrCooper: yes
15:19 cssushiman: Thanks! :-)
15:20 mahkoh: tzimmermann: what is the motivation for this change?
15:20 tzimmermann: MrCooper. hmm, right. there's a check for vblanks in get_sequence
15:21 MrCooper: cssushiman: the exact timing of when the event is sent / received / processed by user space varies for many reasons, nothing can be derived from it
15:21 mahkoh: MrCooper: I don't own any hardware that doesn't support the vblank ioctls. The fallback was put in place because nvidia users reported that the compositor was frozen. Apparently it works quite well for them.
15:22 tzimmermann: mahkoh, these ioctls (or_wait_vblank at least) were for sync'ing with hardware. when emulating the vblank in software, there's nothing to sync with
15:23 tzimmermann: mahkoh, the vast majority of drivers does not support vblank interrupts.
15:24 tzimmermann: it's just that most of the common hardware does
15:24 MrCooper: mahkoh: mutter handles that without a no-op frame
15:24 mahkoh: tzimmermann: if the kernel turns the hardware into a pure VRR device, that is, as long as userspace waits REFRESH_RATE time after each page flip it submits, it can submit a new page flip and that page flip will be scanned out immediately, then your change is fine.
15:25 mahkoh: But if the kernel submits its own "implict" page flips at each vblank, even if userspace does not itself submit a page flip, then those timings need to be communicated to userspace.
15:25 tzimmermann: mahkoh, there's no 'hidden' page flip. page flip timing corresponds with display refresh
15:25 zmike: jenatali: I forget what the right fix for this is https://gitlab.freedesktop.org/mesa/mesa/-/jobs/99966040
15:26 cssushiman: mankoh: I know that with NVIDIA drivers, if I forget to lock a GBM buffer before I remodeset it, drmModeSetCrtc() will deadlock, which results in a freeze.
15:26 zmike: is it that I cannot at all statically initialize an array?
15:27 cssushiman: I know that with NV drivers, there are other gotchas like that in regards to EGL as well. Intel drivers are very forgiving, but NVIDIA's ... not so much.
15:27 tzimmermann: mahkoh, your compositor also needs to drop frames when the page flip takes longer than one display refresh
15:27 mahkoh: tzimmermann: my point is that the kernel will only start scanning out at fixed points in time adn those points in time are determined by a software timer. userspace needs to know about that timer and when it fires. I don't want to run my own timer in userspace that hopefully is in sync with the kernel timer.
15:28 mahkoh: If you want to create a new ioctl that exposes this timer but is not called "vsync", I'd be fine with that.
15:29 mahkoh: cssushiman: I don't use GBM for anything but one-time buffer allocation.
15:29 MrCooper: mahkoh tzimmermann: I don't really agree with that distinction; if the kernel driver doesn't actually use a fixed refresh cycle, that should be communicated to user space explicitly
15:30 tzimmermann: what you're asking for has never worked with any kernel. there are no timing guarantees in general.
15:33 cssushiman: mankoh: Are you beginning your raster routine just *after* the VBLANK, or are you trying to time it to finish just before the VBLANK? What exactly are you trying to accomplish with this change?
15:34 mahkoh: I'm timing it to finish just before the next vblank. I don't know what change you mean; I'm not suggesting any change.
15:36 cssushiman: Interesting. When I was doing my own rendering, I learned that trying to time everything to finish before every vblank was a bad idea, since the render routine would inevitably end up breaking into the vblank interval, and cause a bunch of visual artifacts.
15:37 mahkoh: By finish I mean that the GPU has reported that the rendering has completed.
15:40 mahkoh: tzimmermann: to be clear, since you're also setting DRM_VBLANK_FLAG_SIMULATED on amdgpu, if you changed your patches to also affect the other ioctl, I'm reasonably sure that that would cause a regression.
15:40 jenatali: zmike: Looking
15:40 zmike: ty
15:40 cssushiman: mahkoh: Oh. Right. Are you refering to the 3D engine? That's a completely different component from what we're talking about here. The 3D engine has its own way of determining when rendering is finished, via fences. The blanking interval, and framebuffer scanout is handled by the display engine.
15:41 mahkoh: tzimmermann: Ah, the file is called amdgpu_vkms.c. If that's not the code for normal amdgpu hardware, you can ignore my previous message.
15:41 tzimmermann: mahkoh, it's in the amdgpu's vkms code. it appears disconnected from the other components
15:41 jenatali: zmike: Search for msvc_designated_initializer - currently only defined for nir tests but we can just move that up to the top-level meson.build
15:42 zmike: jenatali: you mean in meson ?
15:42 mahkoh: cssushiman: Yes, I'm aware. I schedule the rendering to complete X time before the next vblank.
15:42 jenatali: Yeah it's an option string that gets passed to override_options
15:42 jenatali: To be clear I mean in the meson.build files in Mesa
15:42 zmike: I see
15:43 zmike: hm
15:43 zmike: dcbaker: is there some way to globally set override_options ?
15:43 MrCooper: cssushiman: visible artifacts means there's a synchronization bug somewhere
15:44 tzimmermann: mahkoh, BTW you can run your software with efidrm to test with a driver with no vblank irq
15:44 cssushiman: mahkoh: How do you accomplish that, exactly? GPUs could slow down, or hiccup for any reason. If you force the rendering to be done at the end of the ~16ms interval between vblanks, than you will have a half-rendered frame morphing into the scanout.
15:46 MrCooper: cssushiman: note that the nvidia kernel driver doesn't handle implicit synchronization (arguably violating KMS UAPI), so the compositor needs to either wait for rendering to finish before posting the buffer to KMS, or set the IN_FENCE_FD property to reference a sync_file which represents the pending rendering, to avoid artifacts with it
15:47 jenatali: zmike: You can set default_options at the top level but those can still be overridden at the command line. Not sure if there's something better than that
15:47 mahkoh: cssushiman: I use geometric decay to estimate the time rendering will take based on the previous frames. The "X time before vblank" also serves as a buffer so that if rendering takes a little bit longer it will still finish in time. Everything in my compositor uses CPU waits, I never submit any unfinished frames to the kernel. If rendering doesn't finish in time, then I just miss
15:48 mahkoh: vblank and add a penalty to X that will slowly decay to the previous X.
15:49 cssushiman: mankoh: Well, I suppose that works well enough.
15:51 mahkoh: Apparently so. I've never measured it myself but some people have and it seems that my input latency is as good as it gets: https://bsky.app/profile/khyperia.bsky.social/post/3ln6zznxaqk2z
15:52 MrCooper: mahkoh: sounds similar to mutter's dynamic frame scheduling
15:53 mahkoh: In any case, I need accurate and up-to-date vblank times for that to work :)
15:54 MrCooper: mutter doesn't need the vblank ioctl for dynamic frame scheduling in general, only for minimizing input → output latency specifically
15:58 tzimmermann: mahkoh, again, there is no such thing a as "accurate and up-to-date vblank times" in general. some devices do that, some don't
16:02 mahkoh: Fine, what I really need is the next time a page flip submitted to the kernel can turn into light, i.e. the time that would reported to userspace in the page flip event.
16:02 mahkoh: Vblank is a convenient proxy for this.
16:06 cssushiman: I think there is a way to predict the next interval with the timestamp of the previous pageflip event? I don't know how well this would work in your program specifically, but I think it's managable.
16:08 mahkoh: That doesn't help if the last page flip was long in the past. Which brings us back to the start of this conversation: I submit dummy page flip requests in such situations so that I can use the value frome the page flip event as a substituted for the value from the vblank event.
16:08 mahkoh: (Dummy = same image rendered again just on a new framebuffer)
16:08 cssushiman: Oh, sorry. I didn't mean to circle back around like that
16:10 MrCooper: mahkoh: just not seeing the point of posting the same contents again for that, instead of the latest available contents
16:11 mahkoh: Is a page flip event generated if userspace doesn't change any of the atomic state, i.e. it submits the same fb again?
16:12 mahkoh: I believe that was my concern at the time. I don't remember if I tested this.
16:13 emersion: as long as the page-flip flag is set, yes
16:13 MrCooper: not following what you're getting at / how it's related to what I wrote
16:14 mahkoh: MrCooper: Posting the latest available contents means posting the same contents that were already in the previous commit. I.e. the same fb. I'm asking if that would actually do what I want, generate a page flip event, or if it would just be a no-op.
16:14 MrCooper: posting different contents requires a different FB
16:14 mahkoh: Sorry, I misread.
16:15 mahkoh: MrCooper: This is for wl_surface.frame requests and such. They are dispatched at the next vblank after being submitted. Same for fifo barriers.
16:16 mahkoh: If there is new content, it will be rendered. But if there are only wl_surface.frame requests or fifo barriers, then the contents might be the same.
16:16 MrCooper: can extrapolate and make worst-case assumptions for commit-timing constraints; timing isn't relevant for FIFO barriers
16:20 mahkoh: I have an implementation that works well. I see no reason to change it when I've never had a bug report related to that code and everyone talking about wp_fifo and wp_commit_timing says that it works perfectly on Jay.
16:21 mahkoh: Surely any day now nvidia will add support for vblank events to their driver.
16:21 MrCooper: haha, that's a good one :)
16:22 MrCooper: I've been pestering them about it for years
16:22 MrCooper: stopped holding my breath long ago
16:23 mahkoh: It must be a hard problem since they said they had bumped the priority based on my feedback around 18 months ago.
16:23 cwabbott: does anyone want to look at the SPIR-V/NIR parts of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41562?
16:24 MrCooper: mahkoh: honestly not sure what could be the problem
16:25 MrCooper: AFAIK all it takes is calling a function roughly around vertical blank, and a callback which returns the current scanout position
16:26 MrCooper: and I don't remember hearing of any issues with nouveau, so the HW seems capable enough
16:27 cssushiman: Will there come a time when the NVIDIA drivers actually work properly?
17:26 elibrokeit: zmike: dcbaker can surely tell you, that the topic of setting "minimum required c++ std" project wide has come up before
17:27 elibrokeit: but we currently don't have a real answer for this, other than that in practice nobody ever does something as silly as set the cpp_std from the command line without first checking what the project defaults to, and using something equal-or-better
17:28 elibrokeit: jenatali: ^^ (also)
17:29 sghuge: konstantin: pixelcluster: Quick question, Can we skip the TLAS build with no instances? I see currently we still end up encoding an internal node with no children.
17:32 konstantin: Is that an issue? On amd we need at least one node
17:36 dcbaker: zmike: on the designated initializer thing you could use `add_project_options(designated initializer opts>, language : ['c', 'cpp'])`
17:36 zmike: hm
17:37 jenatali: You'll get warnings though about redundant C++ version setters though, won't you?
17:37 jenatali: The option is to set C++20
17:38 sghuge: konstantin: nope, Intel HW skips it..even if we don't encode the node. I was just wondering is that the case on AMD..
17:40 dcbaker: oh, yeah, for cpp_std you don't want to do that
17:40 dcbaker: Let me go check something
17:43 dcbaker: dang, I was hoping you could be crafty in project call, but you can't
17:45 dcbaker: the closest thing I think you could do ATM is somethign like project({default_options : 'cpp_std' : ['c++20', 'c++17']}) and then do a check with meson.get_compiler('cpp').get_id() == 'msvc'` to enforce that it didn't fall back to c++17
17:45 zmike: do you want to put up a MR ? :)
17:45 dcbaker: We have also talked about how to set the minimum standard. There's been proposals to do things like `cpp_std : '>= c++17'`
17:45 dcbaker: but no code has ever happened
17:45 dcbaker: sure
17:45 dcbaker: lol
17:45 zmike: thanks!
17:46 pixelcluster: sghuge: you can most likely optimize TLAS build with 0 instances (AMD could just memcpy an empty node into the bvh) but i'm not sure if it's a case worth optimizing for
17:48 dcbaker: jenatali: should we be using c++20 or c++latest? I see both in the code and I'm not sure what versions of MSVC we actualy support
17:48 dcbaker: (or both, like c++20,c++latest,c++17)
17:49 jenatali: I'm fine with limiting ourselves to versions of MSVC that support proper c++20
17:49 dcbaker: I can certainly write the code as c++20,c++latest,c++17
17:49 jenatali: There's been others asking for supporting not-latest MSVC and I haven't tracked exactly which versions people care about though
17:49 dcbaker: okay, I think I'll leave latest in there and we can drop that later if we want to
18:16 dcbaker: Sigh, the nir stuff basically makes c++20 required in general
18:16 dcbaker: Because any c++ code using the NIR headers needs it.
18:16 dcbaker: For sure that means Intel's BRW backend, I'm sure ACO as well
18:17 dcbaker: feels like if something a central as NIR means we need c++20 we might as well just require C++20?
18:19 glehmann: yes please, I want to use bitfield initializers
18:27 alyssa: +1
18:29 dcbaker: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41605
18:48 karolherbst: jenatali: wanna review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41488 ?
18:51 jenatali: karolherbst: The clc change seems fine (assuming that LLVM's fine with canonical forward slashes and doesn't need backslashes for Windows) but did you mean to revert the static libstdc++ change?
18:51 karolherbst: yeah
18:51 karolherbst: I have no idea if that works on windows tho...
18:54 jenatali: It probably does
18:54 jenatali: The CI will tell you for sure, it runs that code in the build job
18:57 karolherbst: I just hate having this issue and the davinci resolve support team going "nah we only support centos 8 or something, sucks being you" 🙃
18:57 karolherbst: I wished we could use dlmopen in the ICD loader, but that's not possible because that also breaks davinci...
18:57 jenatali: Yeah. Welcome to my life
18:58 karolherbst: dlmopen breaks because of gl_sharing and CL runtime having to call into the applications gl implementation, which obviously doesn't work anymore with dlmopen...
19:25 Ristovski: Venemo: randomly stumbled upon your NGG blog post where you have a note about hw support "Vega had something similar, [...] Based on public info I could find, it’s not even worth looking into". Happen to still remember what that was? Got me kinda curious :P
19:27 Ristovski: I checked the SW vs HW stage mapping table in the ACO readme, but I see nothing obvious
19:27 Venemo: Ristovski: what would you like to know about it?
19:28 Ristovski: Venemo: I assume whatever is possible on Vega is not worth due to some limitation that wouldn't make it as efficient as on GFX10+ with all the merged stages?
19:29 glehmann: no public code for vega's primitive shaders exists, so nobody even knows how it works
19:29 Ristovski: Ah, unfortunate
19:31 glehmann: I still wonder if that whole thing wasn't a miscommunication between the technical teams and marketing and the LS/HS and ES/GS merges were all that was planned for vega
19:35 glehmann: on the other hand, there was v_screen_partition_4se_b32 which was clearly intended for some kind of software primitive culling
19:36 Venemo: Ristovski: from the very little information that I could find some years ago, AFAIU the problem was that Vega's "NGG" was supposed to work with the GS ring, so it wasn't directly connected to the rasterizer. that's just my interpretation of it though.
19:36 Venemo: I wouldn't worry about it
19:37 glehmann: if it's not directly connected to the rasterizer, isn't it just ESGS?
19:37 Venemo: sometimes chip designers test new things this way, and if it doesn't work they tweak it in the next gen
19:37 Venemo: glehmann: I guess.
19:39 Venemo: but yeah, it took them until Navi 3 to get it mostly right
19:41 glehmann: but that generation also made hw culling so fast that for VS only workloads there are few benefits to the NGG design
19:43 Venemo: yeah
19:51 Ristovski: oh right, there was also the whole "implicit" path for PS that AMD later cancelled, now I recall
20:05 Venemo: I have no idea what that is
20:24 Ristovski: Venemo: iirc it was marketed that the driver would supposedly rewrite existing shaders to utilize primitive shaders at runtime, which would allow for existing game titles to benefit from the performance boost, then AMD demoted it to an explicit feature (no details given afaik, i.e. if it would be an explicit stage or some internal API)
20:24 Ristovski: it was such a mess that even finding references to it is hard lol https://www.techpowerup.com/240879/amd-cancels-implicit-primitive-shader-driver-support
20:40 zmike: dcbaker: that didn't go as smoothly as anyone expected 🤕
20:41 dcbaker: Fortunately I actually like C++, so this is a semi-fun distraction from coverity bugs
20:41 dcbaker: I do not like coverity bugs
20:41 dcbaker: especialy off by one's and "I don't have enough context to understand this is fine"
20:42 dcbaker: So, some changes to the nir may be required
20:42 dcbaker: nir generators that is
20:46 zmike: then I am happy to have created enjoyable work for you