00:02airlied: nope that is the marge-bot job
01:23airlied: twice now the full marge bot run has gone 3 minus over the hour
01:55Wallbraker: Are you allowed to return image usage from swapchains for an extension you haven't enabled?
01:56Wallbraker: The ATTACHMENT_FEEDBACK_LOOP_BIT_EXT bit is set, but I haven't enabled the extension.
02:08zmike: I don't think there's any restriction on what drivers can return, but you can only use what you enable
02:08Wallbraker: Oki thanks.
02:45dcbaker: bnieuwenhuizen: I’m only working half days right now for personal reasons, it the CI has been so flaky of late that I can’t reliably pull patches. I haven’t tried since Monday to be fair, so I need to try again tomorrow
03:02gfxstrand: dcbaker: Hey! Since you're here, how are things in the world of Meson and crates/proc macros?
03:04Lynne: airlied: patchset pushed in ffmpeg, got tired of waiting for marge to become unstuck
03:05airlied: Lynne: I'll keep throwing at the wall until it lands
03:07dcbaker: gfxstrand: I’ve been kinda disconnected for a bit. I’m actually on bereavement right now, so I’m trying to do release stuff because it’s mostly mindless busywork
03:09gfxstrand: dcbaker: That's fair.
03:09gfxstrand: I can ask Xavier
03:10gfxstrand: I think we're proabably a month or two out from merging NAK anyway.
03:10gfxstrand: Hoping for an XDC merge or so, maybe?
03:19DemiMarie: gfxstrand: is there a document explaining what is so awesome about the future described in <https://lore.kernel.org/all/CAOFGe957uYdTFccQp36QRJRDkWQZDCB0ztMNDH0=SSf-RTgzLw@mail.gmail.com/>?
03:22DemiMarie: You’ve convinced me not to fear it, but it would be nice if there was something could show others.
03:26DemiMarie:wonders if she should start a document summarising what she has learned
03:31DemiMarie:hopes to someday see the blog post mentioned in that message
03:32gfxstrand: DemiMarie: No. Writing that blog post has been on my ToDo for like a year now.
03:34DemiMarie: gfxstrand: fair, and you (obviously!) don’t owe me anything, so don’t feel bad about it.
03:34gfxstrand: You're fine.
03:35gfxstrand: I can feel bad about my blog backlog all on my own. :-P
08:08ayaka_: pq, about that HDR metadata, I don't cover the case that GPU render
08:09ayaka_: because GPU can't render it. Nowadays, not many hardware uses like plain pixel formats, the GPU can't read its tile or compression pixel format
08:13pq: ayaka_, sorry, what GPU render?
08:14pq: ayaka_, even if all sources go directly to KMS planes, userspace still needs inspect all metadata, decide what metadata send to the video sink, and progam the color pipelines of each KMS plane and CRTC accordingly. No GPU render involved.
08:15pq: and that color pipeline programming cannot be left for the KMS driver to guess
08:15ayaka_: pq, yes, that is another problem, how to negotiate the format of that metadata with KMS
08:16pq: no, that's not what I mean
08:16pq: even if the KMS driver fully undertands all metadata of everything, it *cannot* come up with the proper color pipeline universally
08:17pq: that must be done by userspace, because it involves policy and end user preferences
08:17ayaka_: I think I know your concern, even in the embedded platform with only one CRTC
08:17pq: therefore userspace needs access to all metadata of everything
08:18ayaka_: you need to know the target screen colorspace and you may need to change it(likes from yuv to rgb, 8bits to 10bits)
08:18pq: no, I'm talking in more general sense.
08:19pq: There is no single correct way to map from one set of metadata to another. Which way to use and what additional adjustments to do is policy and end user preferences.
08:20ayaka_: likes, we have two CRTCs, one could accept HDR while the other one can't?
08:20pq: Hard-coding policy and preferences in KMS drivers could work for downstream that is specific to a particlar product, but it's not good for upstream which needs to work for many different use cases.
08:21pq: ayaka_, no. I mean that converting from one HDR or SDR standard to another HDR or SDR has no single right way to do it.
08:22ayaka_: pq, well that could be a case, but you need to know you can't convert those dynamic HDR to SDR in those kms devices usually
08:22ayaka_: we usually do that in the separately hardware
08:22ayaka_: about the SDR to HDR, that is not case this RFC metadata covers
08:22pq: ayaka_, the community is already convinced that KMS shall not be programmed with colorspace etc. definitions letting drivers come up with the conversion in between, but programming explicit mathematical operations in the KMS color pipeline, decided by userspace.
08:23ayaka_: pq, then how do you present a HDR10+ or DV image
08:24pq: by extending KMS properties to include HDR dynamic metadata to be sent to the sink, and userspace inspecting that dynamic metadata frame by frame and adjusting KMS color pipeline accordingly when necessary.
08:25pq: What you tell the video sink as the metadata is completely separate from how you program the KMS color pipelines.
08:25ayaka_: how? are you going to create many properties for colorspace data?
08:25ayaka_: kms properties
08:25pq: because you cannot derive a single correct pipeline from two sets of metadata alone, there are always more variables involved, and innovation to be had
08:26pq: we only need KMS connector properties for all metadata to be sent to the sink.
08:27ayaka_: that would be the plane properties
08:27pq: Independently, we need KMS plane and CRTC properties to program the color pipeline mathematical operations.
08:27pq: no
08:27ayaka_: you already know we have HDR plane(video) and SDR planes(UI)
08:27pq: you never set a colorspace on a KMS plane, no
08:27pq: I don't, actually
08:28ayaka_: yes, I am always wondering why we make colorspace(bt709, bt601) in the connector
08:28pq: your per-plane color pipeline capabilities can be different, sure, but they are not described in terms of HDR or SDR. They are described in terms of what mathematical operations that can do on pixel values.
08:29ayaka_: I should say the vendor won't like this idea
08:30pq: We've had long threads on dri-devel@ trying to figure out how to salvage the "Colorspace" connector property in general, and it doesn't look good.
08:31ayaka_: I will talk about that plane colorspace problem later, let me explain why generic properties for color won't work
08:31pq: The latest concensus is to make it work with just "default" and "BT.2020", getting rid of the RGB/YCC variants, and essentially leaving all other options more or less undefined.
08:32ayaka_: because dolby vision won't allow that
08:32ayaka_: what about display p3
08:32pq: I'm not sure Dolby Vision can ever be in upstream Linux.
08:33pq: Linux requires open specs of all metdaata, and I don't think Dolby Vision will provide that?
08:33ayaka_: I know, it may because we can't offer they interface to do so
08:33ayaka_: no, dolby would not, that is why I would send a data blob and let the driver writes to registers directly
08:34ayaka_: ever I don't know what the meanings of data
08:34pq: yeah, that cannot fly upstream
08:34pq: you need to keep that interface downstream
08:34pq: if there is a solid definition of what display P3 means in "Colorspace" property, then that should be salvagable.
08:34ayaka_: besides, this metadata is not just for HDR but also the other metadata. We could discuss the HDR part which you care more
08:35pq: right, I'm not sure it is good to lump all kinds of metadata together
08:35ayaka_: I would talk about the requirement for those static colorspaces later
08:36ayaka_: I didn't, in the previous email, I said we could define a common header for such metadata
08:36pq: for connector "Colorspace", we only need entries for those things that can be communicated in HDMI and DisplayPort signals. It's not about conversion at all.
08:36ayaka_: then driver in the pipeline know which metadata it need to process
08:37ayaka_: pq, you must notice the HDMI phy could convert colorspace
08:37pq: convert what exactly?
08:38pq: It's in the source side still, not sink?
08:38ayaka_: both side, it is connector level, likes from bt709 mpeg to bt709 full range
08:39pq: If it's source side, it's just an implementation detail of the KMS color pipeline. If it's sink side, it doesn't even concern us. The only things that concern us is what goes over the cable: the metadata and the pixel values.
08:39ayaka_: I forgot the name of synopsis hdmi phy
08:40pq: We assume that when we send the metadata to the sink, the sink adheres to it. It makes no difference how it does that, as long as it does.
08:40ayaka_: let me explain when we use this function in hdmi
08:40pq: I don't see how it's relevant.
08:40pq: it's just a driver internal detail how it programs the source side hardware components.
08:41ayaka_: I don't know whether it is relevant or not
08:41ayaka_: let me explain why plane has the colorspace first
08:41pq: KMS UAPI describes the KMS color pipelines, and the driver's job is to map that to hardware any way it likes.
08:42pq: KMS UAPI also lets userspace set the metadata that is sent to the video sink, and it is userspace's responsiblity to program the color pipelines to match what the metadata says.
08:44ayaka_: let us regard the CRTC(the part before phy) as a compositor
08:44pq: What happens at the sink end is irrelevant as long as the sink handles the pixel values according to the metadata. Otherwise, the sink is faulty and not interesting anyway.
08:44ayaka_: let's suppose the tv requests RGB pixel format
08:45ayaka_: while the data for a plane is yuv, it is important to know the colorspace and range of this yuv format
08:45ayaka_: or the compositor can't output the correct image
08:46pq: if by compositor you meant a Wayland compositor, I would agree.
08:46ayaka_: no, the hardware compositor
08:46pq: CRTC not really
08:47ayaka_: in embedded platforms, GPU, CRTC and PHY are three different hardware
08:47pq: Sure, but CRTC and PHY are not exposed to userspace as separate things.
08:48ayaka_: they are, drm_crtc and drm connector
08:48pq: Userspace only knows about the KMS abstractions called plane, (confusingly) CRTC, and connector. These do not match hardware CRTC or hardware PHY 1:1.
08:49ayaka_: so in your case, we should program the connector properties that set the colorspace of each plane?
08:49pq: the KMS UAPI uses an abstract model that the KMS driver can map to actual hardware any way it wants.
08:50pq: no
08:51ayaka_: there are three colorspace you need to care(actually four), planes, compositor hardware and phy in and phy out
08:51pq: connector properties are only the metadata being sent to the sink
08:51ayaka_: so the colorspace could be the plane's properties?
08:52pq: no
08:52ayaka_: then how to render in this case
08:52pq: userspace programs the KMS color pipeline mathematical operations so that whatever is in the framebuffers of each KMS plane, the end result after color pipelines and composition matches the metadata being sent to the sink.
08:53pq: at no point you tell KMS about the colorspace of any framebuffer
08:53pq: you only program the operations
08:54ayaka_: are you going to program EOCF function as the property
08:54pq: yes
08:55ayaka_: then need to have a function 1 function 2..function N
08:55pq: but not as EOCF per se, but as a mathematical curve, which could be one of enumerated ones for example
08:55pq: yes
08:55pq: AMD's private KMS UAPI proposal already has those
08:55pq: and the generic KMS color UAPI plans have them too
08:56ayaka_: any example of that?
08:56pq: I'm trying to find the latest generic proposal atm.
08:57ayaka_: besides my proposal didn't prevent the userspace to read it. I would let the vendor just don't take the DRM property part
08:58pq: emersion, do you have a link at hand to the latest KMS new color pipeline UAPI draft?
08:58emersion: https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5nAccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1QWn488=@emersion.fr/
08:58ayaka_: because it is just a container for different kind of metadata not just HDR
08:58pq: ayaka_, I specifically replied to this sentence: "I don't want the userspace access it at all."
08:59emersion: hwentlan_: did you have time to experiment a bit with an impl for the RFC?
08:59pq: and the email made the point that HDR metadata would a use case here
08:59emersion: is your WIP code pushed somewhere?
08:59pq: emersion, thanks!
09:00pq: ayaka_, have a look at emersion's link above.
09:01pq: ayaka_, the most important concept there is the "prescriptive approach".
09:01ayaka_: yes, the proposal covers the lut case also for those 1d is not that bad
09:02pq: I mean the explanation why do not set framebuffer colorspace in KMS at all.
09:03pq: We do need to set the metadata to be sent to the sink, but that's completely independent in the UAPI compared to what happens to pixels.
09:04pq: ...in the source side
09:06pq: Now, if you have a metadata pass-through from video decoder to KMS, and no way for userspace to read and understand that metadata, and that metadata causes global effects on the final image in the sink, then this whole model simply doesn't work anymore.
09:07ayaka_: pq, from my first email where I quote sima sentence
09:09ayaka_: the HDR is just for a excuse that why there is a common data need container, what I need to delivery is the vendor pixel format compression options
09:10pq: Ok. So it was just a mistake to take HDR metadata as an example.
09:10ayaka_: no, dolby vision is still the case
09:10pq: and we concluded that dolby vision cannot happen
09:11ayaka_: yes, it does, many android devices have supported dolby vision not just synaptics
09:11pq: with vendor downstream BSPs, I presume
09:11ayaka_: anyway, just regard it as vendor data is enough
09:11pq: data is fine, as long as it is fully documented
09:12ayaka_: for the GPU likes AMD which only supports one plane
09:13ayaka_: this property is fine. But for the case likes intel, we would have a CSC pipeline for each YUV plane
09:14pq: If you need additional planes for wacky metadata that describes how decode a framebuffer into pixel values, that's totally fine. It only affects how a framebuffer is read in order to produce input to the KMS color pipelines.
09:14ayaka_: all right, may be those RGB planes as well
09:16ayaka_: the wayland compositor need to know the output colorspace and decide the proper EOCF or OECF for every planes
09:17ayaka_: it is little complex for a demo app
09:17pq: yes, a Wayland compositor does need to know all colorspaces
09:17pq: However, userspace must know the colorspace of the pixel values entering the KMS color pipeline, so that userspace can program the KMS color pipeline correctly. The with GPU composition, userspace must know what colorspace texturing from the buffer will produce. This means that the "hidden" metadata cannot change in ways that would change the resulting colorspace.
09:17ayaka_: GPU is the other case
09:17pq: *The same with GPU composition
09:18ayaka_: if we don't count the AFBC, GPU can't render the vendor pixel format usually, which most of HDR data would use
09:19pq: that's another problem, but more about the (Wayland) compositor design, so ok
09:19ayaka_: anyway, I would just quote your "If you need additional planes for wacky metadata that describes how decode a framebuffer into pixel values, that's totally fine. It only affects how a framebuffer is read in order to produce input to the KMS color pipelines." in my future email. Plane properties won't break the GKI(Android generic kernel)
09:19pq: sure! but include my "However" as well.
09:20pq: demo apps also are not enough to prove new UAPI, so you are going to need a proper userspace anyway
09:20ayaka_: "However" is that GPU part?
09:21daniels: the point of GKI was to get people actually working upstream to co-operate on generic userspace. picking random properties you can stuff unknown magic blobs into isn't doing that, it's just a really bad version of ioctl()
09:21sima:concurs with daniels
09:21pq: the whole https://oftc.irclog.whitequark.org/dri-devel/2023-08-25#32426532;
09:21pq: the whole highlighted one line
09:21sima: we pretty much stopped taking random properties in upstream drivers because of this
09:22ayaka_: daniels, if you are talking about v4l2 that is why my pains coming from. I should blame the upstream design a bad interfaces that vendor hard to fit their devices into it
09:23sima: also the reason why ADF got shot down, it's commit function was just a huge blob
09:23pq: lunch, bbl
09:24ayaka_: as I said, you can't request too much for the vendor, if there is not a GKI, my boss would tell me to finish the work as soon as possible
09:25sima: we have a few decades of tradition of "asking for too much from vendors" here in upstream gpu :-)
09:25ayaka_: pq, that highlight part is fine. But in practice, we would say the colorspace is DV not telling which variant of it
09:26daniels: ayaka_: shrug, it is how it is
09:26daniels: ayaka_: think of it this way - you're coming in and telling upstream that you're ignoring (or haven't even read) any of the design around colour management, and you're looking for a way to completely subvert it and do something against that design, so you don't have to care about anything upstream does
09:26daniels: why would any sane upstream accept that patch? there's zero motivation to do so
09:27daniels: if you're at least involved in the design discussions and implementation, then sure, you get a voice
09:27daniels: but that hasn't happened up until now, so ... shrug
09:27daniels: if this makes GKI hard for the vendors, then that's a problem for the vendors to solve, and they can solve that by actually participating
09:27ayaka_: as I always said, that didn't stop nvidia do what they want to. What I am doing is making people not make too bad design that nobody could understand
09:28daniels: NVIDIA don't do GKI either
09:28ayaka_: except the vendor itself
09:28daniels: AMD, on the other hand, spent a long time participating in both the design and the implementation of upstream colour management
09:28ayaka_: because it is nvidia
09:28daniels: so it's not like upstream and vendors are completely different things
09:29daniels: some participate (costs time, gives the benefit of a voice); others don't participate (benefit of being easier, cost is you have no say in what upstream does)
09:29ayaka_: let me focus my point, I am going to sell my RFC as a generic metadata container exchange interfaces between driver interfaces
09:30daniels: it's not going to get merged
09:31ayaka_: daniels, so any idea about exchange a vendor data assigned with a graphics buffer
09:32daniels: I mean, it seems pretty clear that you either haven't read the section of the DRM docs about new uAPI (which is bad), or you have read it and you think 'these rules are only for other people' (worse)
09:32ayaka_: I know we need a FOSS userspace implementation
09:33Kayden: that's kind of the bare minimum though. just because there is a userspace available that could use a uAPI that upstream doesn't like, doesn't mean they're going to like/accept it
09:34daniels: right. in general you're just dumping a problem ('GKI means vendors have to do more work'), and trying to transfer the problem to other people ('hey DRM people, I haven't bothered contributing anything to help solve your problems, but take this to solve my problem, and take the burden of supporting that forever'). it's a really really bad tradeoff for upstream.
09:35ayaka_: what is not what I am thinking about
09:36ayaka_: it is I can't convince the vendor to accept a clear and FOSS implementation
09:36ayaka_: neither convince my boss to do so. So I have an idea what balance the secure and open
09:39ayaka_: my RFC is just solving a simple problem, how to delivery a vendor specific data from one driver interface to another, which the vendor won't tell you the detail about what it is
09:39daniels: yes, that is a problem _for vendors_
09:39daniels: 'how do I transfer opaque blobs that do stuff I have no idea about' is not a problem that upstream has
09:40ayaka_: if we solve this problem, at least we would have DRM drivers that could display the images beyond DV
09:40daniels: so why should upstream accept the burden of maintaining this interface forever?
09:43Kayden: yeah, I really don't see amd/intel/others being in favor of merging a generic blob passer
09:43ayaka_: I think I have explained why I need this metadata in my previous email
09:44ayaka_: the patches version has been the fifth, I wonder when it would be merged
09:44daniels: you have explained why _you_ need this metadata
09:44daniels: you have not explained why _upstream_ needs this metadata
09:45ayaka_: I don't know why upstream needs this metadata either
09:46emersion: you need to convince the community that it needs it
09:46emersion: if you want to ship something
09:46ayaka_: if the reason that such data exchange mechanism is enough to attach the vendor to contribute their drivers
09:46emersion: but vendor-specific blobs doesn't sound like a great API
09:47ayaka_: because that pixel format is vendor specific
09:48ayaka_: could you display a intel Yf CCS image in the other vendor
09:48daniels: CCS is very well understood
09:49daniels: 'blob of stuff that does stuff' is not well understood
09:49daniels: Intel also spent _years_ of effort plumbing modifiers through the entire stack
09:49daniels: putting that effort that benefits everyone is what gains you credibility in upstream
09:49ayaka_: does intel tell you how to uncompress it?
09:50ayaka_: or arm gave out the afbc algorithm
09:50daniels: by contrast, you are showing up years after colour-management design discussions started, after years of work has happened between Collabora/AMD/Google/Valve/others, and saying 'I haven't even looked at the other stuff but you need to merge my stuff which completely subverts the design'. it's _hugely_ disrespectful if nothing else.
09:51daniels: if you want a formal NAK to the mailing list to help your internal discussion about how you need a proper submission, I can provide one
09:52ayaka_: I didn't say so, I just say dolby vision won't give out their IP
09:52daniels: (the CCS/AFBC examples are totally different - not only is there OSS code which does de/compress them, but it's a very well-understood intermediate transition phase - input->output->input is something you can measure and process. this is talking about input + unknown aux input -> unknown output. that's completely different to lossless compression!)
09:53daniels: right, and NVIDIA wouldn't give out their IP either. but our answer to that wasn't to merge their driver upstream.
09:54ayaka_: daniels, you may mistake, there are two patches
09:55ayaka_: one is for synaptics pixel formats(metadata is about uncompression), one is for metadata exchange(HDR is the excuse which is commonly found in bitstream)
09:56ayaka_: https://lore.kernel.org/lkml/20230402153358.32948-3-ayaka@soulik.info/
09:56daniels: yes, I've seen
09:57ayaka_: I think I offer the same info about the pixel formats as the intel
10:00Kayden: documentation that says that things "have variants" / "may work a certain way" / "we won't describe it" / "is similar to Intel's Y tile but not" isn't striking me as great documentation
10:00ayaka_: Kayden, I have listed the most of common variants there
10:01emersion: the patch has more details iirc
10:02ayaka_: Kayden, and I have explained why it is similar but not in the next sentence
10:02ayaka_: I can't get where the rest two blames are pointing to
10:16ayaka_: daniels, could you tell why the intel or arm were giving out the algorithm, I think I could bring the description from their documents
10:16ayaka_: s/why/where/
10:22ayaka_: I found something like Intel® Integrated Performance Primitives
10:23daniels: there's an igt_ccs test which does CCS, and AFBC also has open implementations
10:24daniels: but again, those are merely intermediate stages: known input -> AFBC -> de-AFBC, produces known output
10:24daniels: known input + DV -> display gives unknown output
10:24ayaka_: daniels, just ignore the DV
10:24daniels: you're asking for a generic mechanism to allow drivers to do completely unknown things
10:25ayaka_: I am talking about the a pixel format with compression options would work with or without DV
10:26daniels: ok, if you're instead asking about the Synaptics modifiers, I think all that's missing is actually describing the tile layout
10:26ayaka_: the container is for that(also things likes secure pipeline's key id)
10:26daniels: if you want to know what to aim for, look at the AMD/NV/Intel/AFBC modifier descriptions, where the (super)tile size/layout/etc is made very explicit within the modifier
10:26ayaka_: daniels, where? Should I draw a layout in the document
10:27daniels: just look at the other vendors, and describe your modifiers to the same level of detail
10:27emersion: in drm_fourcc.h
10:27daniels: the container discussion isn't worth having; as per above, it's fundamentally not going to be accepted
10:28ayaka_: yes, I think I am not worse than nvidia, maybe it is my english didn't make it clear
10:30ayaka_: daniels, because they are the arguments for the algorithm, I didn't know the algorithm myself
10:31ayaka_: emersion, what was missing in the drm fourcc document part https://lore.kernel.org/lkml/20230402153358.32948-2-ayaka@soulik.info/
10:33daniels: you explicitly state that the super/sub tiling layout is unknown
10:33daniels: NV/AMD explicitly describe the layotu
10:33daniels: that's one big difference
10:33ayaka_: well, 48x4 pixels, where a tile has 3x4 pixels and a 8 bits padding in the end of a tile
10:34ayaka_: you could just simply calculate out the layout
10:42ayaka_: I think the introduce section may be misleading, it is hardware may not read from memory in logic address order(consider of memory bank)
10:44ayaka_: if that bothers you, I could offer a version without those formats in the super group or compression.
10:45ayaka_: In short, the modifiers cover have the parameters for hardware except the compression options when the compressed version is used
10:47ayaka_: if you ignore the bits description likes padding, that is how we program hardware
12:31ayaka: emersion, about blob for lut data, I think blob id could be replaced with this shmem fd(memfd). For example, upstreamer(decoder) offers the HDR10+ metadata
12:32ayaka: and userspace read it knowning its HDR10+ data, could be sent to the kms
12:32emersion: we already discussed this
12:32emersion: we do not believe that performance is an issue
12:32emersion: so we do not believe that this is necessary to optimize
12:32ayaka: yes, it is not
12:33emersion: also, i don't think tying metadata to buffers is a good idea
12:33daniels: ^
12:34Lynne: did something break wayland? I'm getting a segfault in wsi_CreateSwapchainKHR/wsi_wl_surface_create_swapchain
12:35emersion: Lynne: do you have a stace trace?
12:38ayaka: Stop blame me about that and I am sure the upstream won't accept it. But I am sure people would implement DV in this way
12:38ayaka: HDR10+ data is open and small, but a page length lut is not, although it won't change
12:38daniels: ayaka: people can do whatever they want to downstream. I have no idea why you believe that upstream are required to accept whatever anyone thinks of.
12:39ayaka: sorry, I want to say, don't blame me no that
12:39Lynne: emersion: wl_proxy_get_version with a null proxy parameter
12:40ayaka: at least I tried, also I want to say I am not disrespect the upstream or try to waste people's time
12:42ayaka: and I can tell my boss the upstream solution is not suitable for us
12:52hwentlan_: emersion, I have most of the API stuff there, with a simple pipeline in VKMS and an IGT test... There are still bits missing to show how I envision this to work, but I'll push my (very messy) WIP branches today
12:52pq: ayaka, I'd like to say there is nothing against your person or your employer. It's just the design concept that does not fit upstream.
12:57daniels: yeah, absolutely
12:58ayaka: this really makes me feel better. People know me that I am struggling to make more drivers work as possible formal to linux kernel
12:59ayaka: but they are many challenges invoking with many parties that I can't change their minds
13:00daniels: hopefully by clearly stating the upstream principles, it's easier to take back to the decision-makers and tell them: 'the problem isn't that I can't convince them, the problem is that our approach is not compatible'
13:01pq: ayaka, I'm sure there are. Having to commit to interoperable and maintained forever interfaces is worlds apart from doing an integrated product that no-one (else) cares how it works inside.
13:06ayaka: I don't want to create a separately world and pushing people to google android or chromebook. But I should say if this doesn't work here, it could try my luck in google. But in this case, there is not much restriction to the vendor
13:06pq: ayaka, do you now have a feeling of what makes a design incompatible? That a design needs to produce predetermined results, even if some intermediate data was hardware-specific or undecipherable?
13:09pq: just after saying that, I realize that HDR static metadata fails that test: the metadata has more or less open specifications, both in itself and in HDMI and DP specs, but its results are... hardware-dependent in monitors >_<
13:10ayaka: you could analyse the signal
13:10pq: I mean some monitors ignore metadata, others ignore different bits of metadata
13:11ayaka: not exactly
13:11swick[m]: I actually believe that we can have an opaque blob thing modifying the colors, as long as it happens after the very end of the exposed CRTC pipeline
13:11pq: most monitors ignore some bits of metadata
13:11swick[m]: if that happens still in the CRTC, or PHY, or the sink itself doesn't really matter
13:12pq: so I guess either is needed: a spec of the data, or predictable results (like compression metadata is unknown, but the result is identical to the original data)
13:13ayaka: the EDID or extend EDID would let you know which HDR formats you suggest
13:14pq: swick[m], you mean a little bit as if it was the sink doing that on its own? But where would you get the right kind of data matching the content?
13:14ayaka: s/you suggest/the tv support/
13:14swick[m]: pq: up for userspace to figure out
13:14swick[m]: if user space has content then it should be able to get the matching metadata blob
13:14ayaka: for our video case, there won't be SDR to HDR(unless you are using the AI)
13:15pq: ayaka, for example, my HDR monitor seems ignore almost all HDR static metadata, but it does check ih maxLuminance > 100.
13:15pq: swick[m], if it's an opaque blob, how could userspace figure it out?
13:15swick[m]: it gets it from somewhere, most likely a video decoder
13:15ayaka: pq, but it didn't stop you send the other HDR metadata to it right
13:15swick[m]: this whole thing won't be useful to me because you can't do compositing anymore in that case because it would invalidate the blob
13:15ayaka: so the CSC pipeline still works in new HDR apis
13:16swick[m]: but for specific use cases... why not
13:16DemiMarie: gfxstrand: if your blog post could explain why that future is still secure and fully virtio-GPU native context compatible that would be amazing.
13:16pq: ayaka, no, it just ignores most of the metadata. Some other monitor might not ignore the same way. Ergo, we have hardware-specific behaviour, which is unpredictable in general.
13:16DemiMarie: Security people generally aren’t convinced by what Windows and game consoles do.
13:17ayaka: pq, yes, but nothing you could do or should do here, the signal is out of phy
13:17pq: ayaka, yes, but it's still bad. I can never be sure what I'm actually displaying.
13:18swick[m]: this is accaptable for a lot of use cases though
13:18swick[m]: and so is DV
13:18pq: if I knew which parts of metadata the monitor ignores, and I knew limits of the monitors, I could compensate in the source.
13:18swick[m]: all true, but does this matter in this case?
13:19pq: but I can know neither, not without SBTM standard at least
13:19swick[m]: if someone wants to play back a DV source then the proprietary, unknown process is what they signed up for
13:19swick[m]: they can and do change the details of how they process the metadata
13:20DemiMarie: swick: maybe the answer is that DV is not suitable for upstream
13:20ayaka: pq, as I said, you can't know, because monitor EDID or extend EDID won't tell you
13:20swick[m]: that's not the answer. the answer is that user space knows what it wants
13:20ayaka: and the signal is right
13:20swick[m]: and we have to design KMS so that it can achieve what it wants
13:21swick[m]: and achieving the exact same output via the color pipeline and shader is one possible user space scenario
13:21ayaka: although I know a MIPI or HDMI analyser is very expensive, especially when it comes to UHD or 8K
13:22swick[m]: just pushing through a DV video with the metadata with possibly some overlay which will get slightly changed by the DV metadata is also a completely reasonable user space scenario
13:22emersion: Lynne: bleh, on it
13:23swick[m]: we already have HDR static metadata. the point of them is that after the pixel pipeline exposed by KMS there will be some adjustments which are guided by the metadata
13:23swick[m]: if that happens in the CRTC, PHY or the sink is not relevant
13:23swick[m]: I don't see how dolby vision is different here, other than being a opaque blob
13:24swick[m]: and if a sink implements the DV metadata guided conversions or the PHY or CRTC does it is also utterly irrelevant from a user space POV
13:24pq: As a display system developer, my goal is to present content the way it is intended to be perceived. I cannot do that if I don't know what's happening. There are a couple of ways to go about that: either I target a reference display in reference environment and trust that the monitor adjusts the picture to the actuals, or I know the actuals and target those directly while the monitor doesn't adjust.
13:26zamundaaa[m]: The difference is that we don't want to be stuck in this situation. We want monitors that are predictable, and we don't want opaque steps in between userspace and those predictable monitors that we hopefully will eventually get
13:26swick[m]: pq: completely reasonable, but other people have other goals and I think that's fine
13:26swick[m]: as long as that doesn't contradict with other goals that is
13:27pq: but it seems I'm usually given the worst of the two: a standard signal format, unknown actuals, and a monitor that does not adjust well enough.
13:27swick[m]: all very true and extremely frustrating pq, zamundaaa
13:28swick[m]: but things like DV are a thing and are very much the opposite of what we're aiming for
13:28hwentlan_: DV = Dolby Video?
13:28pq: that's why I dislike different monitors ignoring different bits of metadata
13:28swick[m]: hwentlan_: dolby vision
13:28hwentlan_: ah, right
13:29pq: swick[m], I haven't yet started replying to your DV comments, maybe later. :-)
13:30hwentlan_: I would love to support it someday. Haven't looked at it closely. Would be nice if there could be something like a closed-source userspace library that deals with it and spits out a 3DLUT or some other well-defined operations that can be programmed through the color pipeline API
13:30swick[m]: the only way yo support DV in a composited system without screwing up color accuracy of the rest of the content is to apply the DV transformation before the compositing. that makes the whole "DV after the exposed color pipeline" thing unusable
13:30swick[m]: hwentlan_: yeah, that would probably be how we'd have to do it
13:31swick[m]: in the wayland color management protocol we could have a "pre-apply LUT" thing that acts directly on the provided pixel values from the buffer
13:31swick[m]: and then the compositor can figure out how to integrate that
13:33swick[m]: but I'm with ayaka that DV metadata could be supported in the KMS API just like HDR static metadata. if the hardware has the capability to do the DV transformation in the CRTC or PHY, or if the sink somehow supports it DV.
13:34swick[m]: how to supply the DV metadata blob is another question
13:34zmike: mareko: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24849
13:36ayaka: swick[m], hardware can't do DV CSC as far as I knew
13:36ayaka: because Dobly didn't release an license for it
13:37ayaka: you could just design a hardware as dolby said
13:37swick[m]: not surprised, I'm just saying that conceptually it doesn't matter
13:59ayaka: swick[m], also emersion killed the possible that metadata assigned with framebuffer
14:00emersion: i did not "kill" it
14:00swick[m]: yeah, that one is a hard sell
14:00emersion: i just said that i don't believe it's a viable path
14:00emersion: we went that path with implicit sync and we're trying to undo it now
14:01emersion: in general, it sounds like a good idea until it's not
14:01ayaka: so your idea is with the daniels that "The mechanism (shmem, dmabuf, copy_from_user, whatever) is _not_ the
14:01ayaka: problem. The problem is the concept."
14:02ayaka: the barrier here is the secure memory, even I could guess out what it is in DV metadata
14:03ayaka: I can't access it. The only possible is that synaptics compression pixel formats but the more detail is unknown to me
14:08Lynne: emersion: vulkaninfo crashed in wsi_wl_surface_get_capabilities2 even with the patch
14:10emersion: damn
14:11Lynne: also not seeing immediate swapchain mode supported in mpv, not sure if that's intended
14:12emersion: the compositor doesn't support the ext most likel
14:12emersion: y
14:13Lynne: ah, no support in wlroots/sway yet?
14:14emersion: not yet, there is a MR
14:14emersion: hm, i think that's a bug in vulkaninfo?
14:14emersion: > If pPresentModes is NULL, then the number of present modes that are compatible with the one specified in VkSurfacePresentModeEXT is returned in presentModeCount
14:14emersion: however it seems like no VkSurfacePresentModeEXT is chained?
14:15apteryx: has there been other reports than mine regarding OpenGL regressions on old nVIDIA GPUS using Nouveau after moving to Linux 6.x ? https://gitlab.freedesktop.org/drm/nouveau/-/issues/192
14:15emersion: or am i misreading the spec?
14:15emersion: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSurfacePresentModeCompatibilityEXT.html
14:20emersion: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkGetPhysicalDeviceSurfaceCapabilities2KHR.html#VUID-vkGetPhysicalDeviceSurfaceCapabilities2KHR-pNext-07776
14:20emersion: If a VkSurfacePresentModeCompatibilityEXT structure is included in the pNext chain of pSurfaceCapabilities, a VkSurfacePresentModeEXT structure must be included in the pNext chain of pSurfaceInfo
14:23emersion: https://github.com/KhronosGroup/Vulkan-Tools/issues/846
14:42hwentlan_: emersion, pq, swick, and anyone else interested... the very much WIP work for the color pipeline API:
14:42hwentlan_: kernel: https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5#5d07be320faf72c3151afbba0d107b5c03141412
14:42hwentlan_: libdrm: https://gitlab.freedesktop.org/hwentland/drm/-/merge_requests/1
14:42hwentlan_: igt: https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1
14:43swick[m]: hwentlan_: oh, cool! will take a look next week
14:43hwentlan_: I intend to bring them to a point where I have two real pipelines, one with more than one op
14:43hwentlan_: and then clean them up
14:43emersion: nice!
14:43hwentlan_: for now it's best to not look at individual patches, but look at the "Changes" tab on the MR, i.e. look at the entire diff
14:44swick[m]: ack
14:44hwentlan_: I also need to implement a basic algorithm for the pipe discovery and programming in IGT
14:44hwentlan_: that's sort of next on the list
14:45koike: hwentlan_ btw, since you have it on fdo, I was wondering if you could test your changes with drm ci (drm topic/drm-ci branch), and it would be great to have your feedback on the ci :D
14:45hwentlan_: let me know any feedback you have for the bits that are there. the API stuff and the bits for the new drm_object should pretty much be there
14:46hwentlan_: koike: hmm, never looked at it. does it run IGT
14:46hwentlan_: ?
14:47hwentlan_: koike: gfx-ci/drm-ci?
14:47pq: swick[m], you're right from that point of view, that things like network cards are well accepted, and NICs can be used to send arbitrary payload anywhere. So why shouldn't a KMS driver be able to do the same. OTOH, the KMS payload affects everything KMS does on that screen, and a DV blob does unknown things. I think opaque vs. understood blob is a major difference, even if the results are not predetermined.
14:50swick[m]: both metadata that you understand and a opaque metadata blob result in more or less arbitrary color transforms anyway
14:50swick[m]: I think this is mostly a question of policy and has no technical relevance
14:51swick[m]: I would certainly prefer metadata that I understand
14:52pq: indeed
14:53pq: I also think there is huge political difference whether an unknown blob is used to program local hardware vs. sent outside as-is.
14:54swick[m]: how so?
14:55swick[m]: oh, you mean because programming local hardware is part of the kernel driver and if something goes wrong and we don't understand the data we send to the hardware that's just horrible
14:55swick[m]: yeah, I guess I agree
14:56pq: yeah, like that, or even executing that unknown code on local hardware, even if it's not the CPU
14:57pq: the very existence of a mandatory unknown blob is a violation of a user's right to use the hardware they own for whatever they want to, but for practical reasons it often (firmware) has no alternative
14:58pq: "I bought a monitor that understands DV, but I can't produce any DV myself to make use of it."
14:59pq: as long as a DV blob does not program any local hardware, I really don't know what to think of it.
15:00pq: Do we let users enjoy DV content with their DV monitors and support the proprietary system, or not.
15:05pq: I suppose it has a practical answer: I do not have a license to develop, review, or test anything related to DV. So that's it. Someone would have to figure out if Linux can even legally forward prebaked DV blobs in general.
15:06karolherbst: users caring enough probably also run linux-libre where things like that will probably be patched out anyway
15:07karolherbst: and the others just want things to work
15:10koike: hwentlan_ yes it runs igt on several devices, git://anongit.freedesktop.org/drm/drm branch topic/drm-ci
15:10pq: I mean, could e.g. Red Hat get sued if Fedora Workstation shipped a kernel that allows activating the DV mode of a DV certified display after the end user installs some proprietary video player? Or worse, some FOSS project reverses enough of the DV blob to make use of it.
15:11karolherbst: RH has lawyers to figure that out
15:11pq: would be nice to know before people waste time on it
15:11karolherbst: but given how much of a problem h.264 was... maybe the answer is that RH won't support it
15:12karolherbst: yeah... maybe it makes sense if people knowing the details enough to bring that up, do we have any lawyers we could ask from a fdo/linux kernel perspective? does the linux foundation have lawyers we could ask?
15:13koike: hwentlan_ basically the branch I sent applies this patch https://lists.freedesktop.org/archives/dri-devel/2023-August/418499.html (with a few fixes to the commit, but doesn't affect its execution), you basically just need to apply this commit, go to the settings of you linux gitlab fdo CI and point the CI yml file to
15:13koike: drivers/gpu/drm/ci/gitlab-ci.yml
15:13karolherbst: though I guess for a pure linux perspective it doens't matter as only distributions/vendors ship binaries
15:13karolherbst: and it's their problem
15:13pq: personally, I really cannot be interested in going through all that trouble to support a proprietary ecosystem
15:13karolherbst: yeah....
15:14karolherbst: but also the linux desktop ecosystem is kinda lacking a lot and it won't be better if we choose to not support those things. Maybe DV doesn't matter and that's the end of it, maybe it matters a lot and it will be a deal breaker, no idea myself :) Just I think a general user isn't happy if they buy fancy hardware and nothing works
15:15karolherbst: or like users have a netflix subscriptions but only get 720p content, $because
15:15karolherbst: even though owning 4K@120 hardware
15:15koike: hwentlan_ this is an example of a pipeline it runs https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/970661 , the branch is already included in linux-next (in case you want to test on top of that)
15:16karolherbst: in a perfect world everything would be open and we wouldn't have such issues, but reality is as nice to us, so we are left with that and have to figure how to make the best of it, and what even the best means here
15:17koike: hwentlan_ you can even point to your version of igt, so it builds your version
15:17karolherbst: and we already have constraints anyway, and what if e.g. a kernel regresses with certain blobs nobody understands? that's also a major issue, I just don't know if that's even relevant in this case
15:18hwentlan_: koike: thanks for the great pointers. Will take a look at that
15:18pq: hwentlan_, awesome :-)
15:28Lynne: pq: libplacebo supports dovi
15:29Lynne: it can even convert dovi to regular hlg hdr
15:31Lynne: ah, but only the profile used in web distribution, blu-rays use a different profile which probably couldn't be supported without major DRM changes
15:32Lynne: that profile requires two frames to correctly present, a regular 10bit 4k image, along with an 8-bit 1080p image, of which only the top two bits are set
15:34Lynne: really, dovi is basically a flexible compatibility layer from which other HDR variants can be generated
15:36MrCooper: pq: "which parts of metadata does this monitor respect/ignore?" seems like another thing which could be tracked in a libdisplay-info database (though in some cases it might depend on firmware version, which I'm not sure can be reliably determined)
16:37DemiMarie: karolherbst: personally I think Netflix should be handled on dedicated media player hardware.
16:38DemiMarie: Better yet would be for DRM to just be outlawed, but that won’t happen.
16:41DemiMarie: <pq> "I suppose it has a practical..." <- Yup. Something that can’t be reviewed can’t be accepted.
16:50gcarlos: Hi guys, I just sent my patchset to the kernel mailing list but got some strange error on git send-email and messed it up by sending just half of it, what should I do? I tried to sent the remainder patches manually but they didn't reply the patchset thread :(
16:53karolherbst: DemiMarie: sure, but that's not what users have right now
17:00karolherbst: uhhh
17:01karolherbst: why are there spir-vs with the `Shader` _and_ the `Kernel` cap?
17:01karolherbst: *sigh*
17:04penguin42: are there any open tools for reading Radeon profiling registers - if there are any to read - I'm after finding info on thing like bank conflicts and the like
17:26agd5f: penguin42, mesa supports the GL_AMD_performance_monitor extension. You can also use something like RGP: https://gpuopen.com/rgp/
17:35penguin42: agd5f: I did download rgp and go tthe gui running but it complained of being unable to start; I'm assuming it wants AMD rather than standard Linux drivers but it wasn't clear (I'm on F39) any examples of using GL_AMD_performance_monitor?
17:37penguin42: oh hang on, rdp has started up today - it didn't want to do that the other day
17:46penguin42: agd5f: I'm missing how to actually gather a profile (This is an OpenCL application) - I have the rdp open that lets me configure stuff, and I have rgp which looks like it would be great to analyse a profile if I had one
18:25idr: https://mesa.pages.freedesktop.org/-/mesa/-/jobs/48021949/artifacts/results/summary/results/trace@gl-zink-anv-tgl@gputest@pixmark-piano-v2.trace.html isn't showing the expected results or the differences.
18:26idr: Any suggestions?
18:28dj-death: idr: update the hash
18:28dj-death: idr: it's know to change with compiler changes
18:28dj-death: idr: not just on intel
18:28idr: Right... but usually it will show the before and after images so that you can decided if the change is okay.
18:28idr: I have had cases where an image change was a bug.
18:29idr: The reference image is just a broken image link. :(
18:33dj-death: yeah
18:33dj-death: not sure what's up with that
18:35idr: Hrm... I clicked 'Retry' to re-run the test. Maybe that will sort it out.
18:37dj-death: idr: usually no
18:37dj-death: idr: there is probably the hash somewhere in the log
18:37idr: I don't think it will change the rendered results. :) I'm just hoping it will fix the broken image link.
18:37dj-death: actual: b1c96546107d8a7c01efdafdd0eabd21
18:37dj-death: expected: 5bc82f565a6e791e1b8aa7860054e370
18:38dj-death: interesting, none of those are on main it seems :)
18:39dj-death: ah no
18:39dj-death: I have an out-of-date one
18:40dj-death: idr: you see that a NIR change updated that hash :)
18:40dj-death: that trace should probably use a human perceptible difference
18:40dj-death: it's an option for imagemagick, that might be the solution to this
18:45idr: Blarg.
18:45idr: Didn't change anything.
18:48idr: anholt, daniels ^^^ Suggestions?
18:57daniels: DavidHeidelberg[m]: ^ help pls?
19:03DavidHeidelberg[m]: idr: look good, we dont have uploaded ref image, so harder to compare :(
19:04idr: Bummer. :(
19:04idr: Okay... I'll just update the hash and move one.
19:04idr: *move on.
19:04idr: DavidHeidelberg[m], daniels, dj-death: Thanks.
19:04DavidHeidelberg[m]: Maybe it got dropped w/ some migration, for some hashes it happened i think
19:07Sachiel: wouldn't it make more sense to disable that trace then?
19:08DavidHeidelberg[m]: If I follow right, we just dont have reference screenshot, the trace is fine
19:08DavidHeidelberg[m]:is on the phone right now, so rechecking orc history again
19:08DavidHeidelberg[m]: *irc
19:09idr: DavidHeidelberg[m]: Correct.
19:09idr: The trace runs fine and produces a result. When the result hash doesn't match the expected hash, you don't get to see an image of what is expected. You only see the "after" image.
19:10DavidHeidelberg[m]: Btw. yes, in worst case of doubt you can look at different hash from different HW which is not http 404 (I did that few times), but what I recall the screenshot look right
20:18DemiMarie: Does a GPU reset mean that something went wrong with the GPU hardware, firmware, or driver? Or are some GPUs still unable to cleanly recover from faults, timeouts, etc without resetting the whole GPU?
20:25robclark: gpu reset can be anything, but usually it amounts to usermode driver did something wrong (which could potentially involve not working around a hw/fw limitation)
20:38Lynne: sometimes it could mean "user program did something wrong"
20:43robclark: true.. especially with faults
20:43robclark: (but at least for drm/msm we don't reset the gpu on mem faults.. unless the gpu hangs or generates hw fault)
20:59penguin42:thinks he's seen it on Radeon when his shader has screwed up badly
21:02DemiMarie: robclark: obviously a bad shader can cause the GPU to fault, but I was hoping that the impact of that would be contained to whichever userspace process submitted the buggy shader. Being able to reset the GPU seems analogous to a buggy unprivileged userspace process causing a kernel panic, which would obviously be an OS or hardware problem.
21:03DemiMarie: Are GPU hardware and drivers just not at that level of robustness yet?
21:10ccr: will they ever be
21:11Lynne: penguin42: it's not an achievement, I can crash both intel and radeon cards
21:12Lynne: though when intel resets, you barely notice these days, but when radeon goes, sometimes not even a reisub is enough
21:13Lynne: what would be an achivement would be to cause nvidia gpus to crash in a way you'd notice, so far even running the dirtiest decode/subgroup/oob code I haven't been able to
21:18idr: DavidHeidelberg[m]: Is there something I can do to get a reference image added? So this doesn't happen to the next person.
21:19DavidHeidelberg[m]: last time I looked, it's automated somehow, but I can look into it again
21:24anarsoul: are there any guarantees in NIR about store output intrinsics regarding their location?
21:24idr: Okay. That would make sense. Hopefully changing the expected checksum will trigger that.
21:25anarsoul: basically I need to combine 2 store_output intrinsics into a single store_zs_output, since Z and S outputs are written at once on Utgard (i.e. lima)
21:25anarsoul: i.e. something similar to pan_nir_lower_zs_store()
21:31robclark: DemiMarie: we replay the jobs from other processes queued up behind the faulting job, so no impact to other processes
21:31DemiMarie: robclark: ah, do GPUs not have more than one process executing at once?
21:31DemiMarie: s/process/job/
21:32DemiMarie: and what about with firmware scheduling where many jobs can be scheduled at once? hopefully one job faulting does not bring down all of them.
21:34DavidHeidelberg[m]: usually the driver survives it :)
21:35robclark: with fw scheduling, the fw would have to do the equiv thing.. kill the job that crashed and replay the others (possibly with help of kernel? Not really sure, I don't yet have fw sched)
21:54DemiMarie: robclark: why “replay” as opposed to “allow to continue”?
21:54DemiMarie: I’m probably missing something obvious here
21:55DemiMarie:wishes there were a book that explained how modern GPUs work internally
21:55robclark: well, it could be either.. "replay" is the implementation detail.. since we've reset the gpu.. maybe if something had a way to reset the "gpu" part of the gpu without resetting the fw scheduler it could simply be "allowed to continue"
21:56robclark: it's just an implementation detail
22:19DemiMarie: Okay so I am definitely misunderstanding something.
22:25robclark: I mean, the details might differ per gpu, but it amounts to "kill the bad job, let the others proceed"
22:30DemiMarie: thanks!
23:22DavidHeidelberg[m]: zmike: except that when not cached your csgo trace loads like 3 minutes, it seems to behave stable in CI and it's pretty complex, so thanks again! (also I force pushed the compressed version without any changes)
23:33nightquest: DemiMarie: I remember someone in #radeon had similar question (ie. "how GPUs work internally") and this link came up: https://www.rastergrid.com/blog/gpu-tech/2022/02/simd-in-the-gpu-world/ - sorry if it's not entirely relevant to the question, hope somehow this can help you
23:37DemiMarie: nightquest: I’m somewhat familiar with how GPUs execute instructions (the “data plane”, so to speak), but how they are _managed_ is much less well documented.
23:41DemiMarie: Before a GPU can execute any user instructions, page tables need to be set up, the MMU needs to be pointed at the right page table root, textures need to be bound, etc. On a CPU the equivalent operations would be done by code executing in a privileged mode of the CPU, but my understanding is that GPUs generally don’t have such a thing, so something else needs to do that.
23:44DemiMarie: Such details are only really relevant to two groups of people: driver writers (most people in this chat) and those who want to understand what happens when stuff goes wrong (me!).
23:56nightquest: Yes, I assumed this is kinda "elementary" stuff for folks here. But I'm glad I replied, as you have given very nice introduction to this article for me. Thanks!