IRC Logs of #dri-devel on irc.freenode.net for 2024-02-23

09:00 pq: tzimmermann, what do you think of using fbdev UAPI to drive keyboard RGB leds? :-p
09:00 tzimmermann: pq, wat? go away!
09:00 pq: lol
09:01 tzimmermann: wasn't the discussion about auxdisplay?
09:02 pq: yeah, I saw fbdev code in the auxdisplay driver mentioned.
09:02 ccr:nukes RGB leds from the orbit
09:02 pq: is there another UAPI for auxdisplay, too?
09:04 pq: I couldn't tell if cfag12864b.c had any UAPI in it, but cfag12864bfb.c seems to use fbdev things? Are they parts of the same driver, or two separate drivers for the same thing?
09:05 tzimmermann: auxdisplay is "all the rest" that didn't fit anywhere else AFAICT
09:05 pq: just wondering and stirring the pot, no big deal for me :-)
09:06 tzimmermann: i've just glanced over that discussion. OMG
09:07 tzimmermann: please let us not treat keyboard leds like regular displays
09:07 pq: :-D
09:07 pq: btw. kernel docs say: "The cfag12864bfb describes a framebuffer device (/dev/fbX)."
09:07 tzimmermann: fbdev and drm should be reserved for display that show the user's console or desktop
09:08 tzimmermann: but not some status information or blinky features
09:09 tzimmermann: pq, indeed. some of the ausdisplay HW seems to be some kind of led device. so there's an fbdev device for it. whether that makes is questionable
09:22 tzimmermann: i think jani made a good point about handling these leds in the input subsys
14:50 mareko: karolherbst: if it's useful, radeonsi could do SVM where CPU pointer == GPU pointer
14:51 mareko: karolherbst: we can implement pipe_screen::resource_from_user_memory to do that by default with the amdgpu kernel driver, or based on a a flag
15:20 karolherbst: mareko: yeah.. that's how I plan to implement non sytem SVM
15:21 karolherbst: I have a prototype based on iris, but it blew up the applications VM
15:21 karolherbst: kinda need to find some time and properly think it all through
15:22 karolherbst: the biggest issue is just how to synchronize the VMs on both sides properly
15:24 karolherbst: like.. if the driver allocates a bo, which could be used for global memory, it probably also needs to `mmap` at the same location on the CPU side. Or like mmap on the CPU first and then just place the bo at the same location on the GPU side
15:25 karolherbst: and my plan was to add a "SVM" flag to pipe_resource_Flags so the driver knows it's a SVM thing or so
15:27 karolherbst: sadly, the story for discrete GPUs is way more complex, because you obviously don't want to operate on host memory, just mapped on both sides and I didn't even get to the point where I'd do memory migration
15:42 DemiMarie: robclark: if kernel submission does not protect the GPU or its firmware in any way, then userspace submission is an improvement!
15:45 robclark: no, I think more likely it is just a false sense of security, tbh
15:58 DemiMarie: robclark: I see! Do GPU vendors generally do a decent job at writing firmware?
15:59 DemiMarie: robclark: how hard will it be to proxy the doorbells? It is strictly unsafe to pass MMIO to a VM under Intel unless the MMIO behaves like memory, in that reads return the just-read value and both reads and writes complete in bounded time.
15:59 robclark: well, there is a pretty wide range of what can be called firmware, ranging from things that have some sort of RTOS to things that are somewhat more limited
16:00 robclark: but on-gpu escapes is much more rare than more mundane UAF type bugs
16:01 mareko: karolherbst: what I meant is that we can assign any GPU address to any buffer if the address range is unused, and the whole address range used by the CPU is always unused because our GPU allocations choose addresses that CPU allocations wouldn't use, and that's for SVM. For resource_from_user_memory, we can use the CPU pointer as the requested GPU address for the buffer, which is the most trivial case.
16:01 mareko: resource_create is more involved because you would have the pass the desired GPU address to it.
16:02 robclark: hmm, I'm not entirely familiar w/ the doorbell issue.. I would have expected it to work because that is basically how sr-iov works (although maybe not on past/current devices, idk)
16:04 karolherbst: mareko: like.. if I'd allocate a pipe_resource and would map it, the mapped address would also need to be the same as seen on the GPU
16:04 karolherbst: but
16:05 karolherbst: if the driver can promise, that addresses of GPUs bos are either reserved or won't be able to be used by CPU allocators, that might be good enough. There is then the question of how would synchronization work between the host and the GPU
16:06 mareko: like I said, our GPU BOs use address that CPU allocators wouldn't use
16:07 mareko: *addresses
16:07 karolherbst: but it also highly depends if we are talking about system SVM or not here. For non system SVM the allocations are explicit. For system SVM any CPU pointer needs to be valid also for the GPU
16:07 karolherbst: like.. wouldn't or won't use?
16:08 mareko: won't
16:08 karolherbst: who is managing the VM for radeonsi btw? Is that the kernel or is it done in userspace?
16:09 mareko: our GPU VM design is that all CPU addresses that the process can use are currently never used by GPU alloations
16:09 mareko: so the kernel could mirror the whole process address space
16:09 mareko: into the GPU
16:09 karolherbst: okay
16:09 karolherbst: yeah, that sounds like more it's designed to implement system SVM things :)
16:10 mareko: it seems, but no
16:10 mareko: amdkfd does that mirroring, while amdgpu requires explicit VM map calls
16:10 karolherbst: so I guess host allocations still need to be "imported" via userptrs or something
16:12 karolherbst: mareko: I think I just have two questions then: 1. if I map a pipe_resource, can the mapped pointer be the GPU address valid for the CPU? and 2. Could I allocate a `pipe_resource` in a way that it's placed at a given address?
16:12 karolherbst: like..
16:12 karolherbst: given address as "on the CPU side there is an allocation I want to mirror inside VRAM"
16:13 karolherbst: there is a USM extension which allows for very explicit placement and migration, so I might want to have the ability to move memory between system RAM and VRAM, but placed at the same address on both sides
16:13 mareko: it's about page table mirroring, VRAM or GTT placement doesn't matter
16:13 mareko: 2 is trivial, you can choose the GPU address for any created pipe_resource
16:13 mareko: and any imported pipe_resource
16:14 karolherbst: okay, and it can also be an address which already exists on the CPU's side?
16:15 mareko: yes
16:15 karolherbst: okay, yeah, that should be good enough then
16:15 mareko: I don't know about 1 since we assign GPU addresses that the CPU process wouldn't use, so I don't if mmap can even use them
16:15 mareko: *I don't know
16:17 karolherbst: 1 is a hard requirement by CL sadly. I could do the reverse way: allocate on the host and then userptr import it, but... how would I get the memory to be migrated into VRAM?
16:17 mareko: you wouldn't
16:18 mareko: 1 is only dependent on mmap being usable, not on the driver
16:19 karolherbst: yeah :) that's the problem and why I'd like to allocate a pipe_resource instead and just make sure the address to which it gets mapped is the same. `mmap` does allow you to specify where you want to map something though
16:19 karolherbst: but it's not guaranteed to succeed afaik
16:19 karolherbst: but that's someting I could play around with
16:19 mareko: if you use a normal BO and you access it with a CPU and the BO is in invisible VRAM, it will cause a CPU page fault and the kernel will migrate it to GTT and keep it there
16:21 karolherbst: could I force the migration without having to access it? Or would I have to touch every page? Or just one page?
16:21 karolherbst: like.. if I can read at offset 0x0 and it would migrate the entire allocation that's good enough
16:21 karolherbst: though I don't really need that, as memory migration is just a hint on the API level
16:21 karolherbst: explicit migration I mean
16:22 mareko: the migration is forced by touching the page with a CPU, and it migrates the whole buffer
16:22 karolherbst: okay
16:22 mareko: recent CPUs and BIOSes allow all VRAM to be visible
16:23 karolherbst: so the only thing to figure out would be the mapping thing then. But yeah, a driver guaranteeing that bo's won't overlap with CPU allocation is indeed a big help
16:24 karolherbst: or rather, with mappings in general
16:29 MrCooper: note that any CPU reads from VRAM will throw you off a performance cliff
16:31 karolherbst: yeah, that's why I want to be able have allocations on both sides at the same address
16:31 karolherbst: so I can do explicit migrations
16:55 MrCooper: https://gitlab.freedesktop.org/drm/amd/-/issues/3195 looks like birdie is going for the triple crown of getting banned on LWN, Phoronix and fdo GitLab
16:57 CounterPillow: didn't know Phoronix even banned people
17:00 CounterPillow: >What would need to happen would be for the media player to be able to ask the compositor if it can just hand it the raw YUV video data. If the compositor supports that and uses display planes to handle it, then the media player can just share the YUV images rather than RGB, cutting out the GFX work in the media player.
17:00 CounterPillow: mpv already has a VO for this (dmabuf_wayland)
17:00 karolherbst: impressive
17:01 CounterPillow: it's a fairly low amount of code last I checked
17:04 CounterPillow: obviously, it will never be the default, because both compositors and hardware are too spotty with their implementations, and it likely uses a lower quality scaler than mpv's current defaults, and also iirc it currently requires hwdec which is also unlikely to be turned on by default judging by how often AMD manages to find ways to break it
17:05 MrCooper: CounterPillow: "banned" in quotes, he's been posting as "avis" ever since the birdie account was banned, doesn't even to try to hide it's him, but nothing happens
17:06 CounterPillow: heh
17:06 karolherbst: anyway.. if people are under the impression that users are wasting time or are in other ways disrespectful, we can certainly discuss this, but I haven't really seen much from birdie on gitlab besides maybe wasting times or making some out of place remarks
17:06 MrCooper: even so, a Phoronix "ban" is some kind of achievement I guess
17:07 MrCooper: karolherbst: yeah I was joking, let's hope he's not just getting warmed up though
17:07 karolherbst: nah
17:07 CounterPillow: yeah it's probably better not to shittalk people here even if they deserve it
17:07 karolherbst: birdie is being active on the gitlab for years now, so I guess it's fine
17:08 karolherbst: well.. "active"
17:12 MrCooper: CounterPillow: if he doesn't want to get called out for what he's posting on the Phoronix forums, he can always stop, we'd all be better off for it
17:12 CounterPillow: Personally I simply do not read places with a high frequency of bad posts
17:15 Lynne: 9a00a360ad8bf0e32d41a8d4b4610833d137bb59 causes segfaults on wayland
17:15 Lynne: is pelloux here to discuss? I'd rather not send a revert MR
17:18 MrCooper: pepp: ^
17:20 pepp: Lynne: annoying. Do you have more details?
17:23 Lynne: mpv and firefox crash instantly, in libvulkan_radeon
17:23 Lynne: running on sway with the vulkan backend
17:24 Lynne: wlroots generally causes programs to do a swapchain rebuild twice in a quick succession on init, maybe it's related to this?
17:28 pepp: Lynne: I guess it's missing a "if (chain->wsi_wl_surface)" check
17:29 DemiMarie: Regarding SVM: SVM can be perfectly compatible with virtio-GPU because while the guest userspace program doesn’t make any explicit requests to make memory accessible to the GPU, the guest kernel can make these requests to the host.
17:30 DemiMarie: CounterPillow: is hardware decoding unreliable under desktop Linux?
17:30 CounterPillow: yes
17:30 DemiMarie: CounterPillow: why?
17:31 CounterPillow: bugs
17:31 DemiMarie: in what?
17:31 CounterPillow: the driver
17:31 Lynne: pepp: can confirm that fixes it
17:32 DemiMarie: Is this because of distributions shipping old versions of Mesa?
17:32 CounterPillow: no
17:32 CounterPillow: new bugs are added all the time
17:32 DemiMarie: What makes it more reliable under Windows/macOS/ChromeOS/Android/etc?
17:33 pepp: Lynne: thx, I'll open a MR soon
17:34 CounterPillow: never said it was more reliable there, but for Android/macOS/ChromeOS it's definitely more engineering resources invested
17:35 DemiMarie: I thought ChromeOS just used upstream drivers.
17:35 CounterPillow: Not always true, and most importantly they usually do not ship AMD hardware as far as I know
17:36 DemiMarie: Are Intel’s drivers more reliable?
17:36 CounterPillow: I don't know since I don't use Intel, but they sure seem to be judging by the number of mpv bugs opened concerning vaapi misbehaving
17:37 mareko: karolherbst: radeonsi can change BO placement, but if there is not enough memory, it's only a hint
17:39 mattst88: CounterPillow: we have AMD Chromebooks nowadays, FYI
17:40 mattst88: they're using the video encode/decode drivers in Mesa
17:40 CounterPillow: Boy I sure do hope they pre-validate all input then because you can get 100% repeatable GPU resets by feeding AMD's VCN corrupt H.264 streams
17:41 mattst88: I don't work on the AMD stuff directly, but from what I've heard the video driver stability is not great
17:42 mattst88: e.g. we split out radv into a separate package that can be updated independently of the radeonsi driver (which is great) and the video driver (which is not great, AFAIK)
17:43 robclark: _eventually_ we will have some gitlab ci for amd video.. IIRC it is still blocked on some deqp-runner MR
17:45 CounterPillow: That'd be great, especially if it tests all still relevant generations of VCN (one of the more frustrating parts is reporting a bug and then being told that the AMD engineers don't have that hardware to reproduce it on)
17:46 robclark: it would be re-using the existing gitlab ci farms... I know we've sent some amd chromebooks to the collabora farm, but I doubt it is exhaustive.
17:46 CounterPillow: :(
17:47 robclark: someone sufficiently motivated and w/ enough hw is ofc welcome to host their own ci farms to expand the hw coverage
17:48 CounterPillow: I am planning on setting up a lava lab eventually but it does feel a bit silly that AMD's driver team does not have access to AMD's hardware
17:49 robclark: 🤷
17:49 DemiMarie: CounterPillow: is it mostly old hardware?
17:50 robclark: it would ofc be nice if hw vendors ran or sponsored ci farms.. but it can quickly turn into a large project depending on how far back in # of gens you go
17:50 CounterPillow: In my case, AMD Picasso isn't *that* old, and no, mpv has just had a bug filed caused by a 7900 XT's hardware decoder which is the current gen
17:51 CounterPillow: I've seen the "sorry we don't have that hardware" response for issues reported on 6xxx series cards, i.e. previous gen
17:51 DemiMarie: Oh dear
17:52 DemiMarie: Seems like they only support the most recent hardware generation.
17:52 CounterPillow: it's not a matter of policy, I don't think
18:04 abhinav__: daniels Hi, GM. Just wanted to check with you on https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/1193 . If this is still the right way to submit this request
18:08 robclark: abhinav__: I think daniels is migrating drm-misc to gitlab so that shell accounts will no longer be needed
18:10 abhinav__: robclark yes, thats why I wanted to check whether the old process of applying for committer access still holds true or what would be the new method .... as only existing committers will be migrated to gitlab not new ones
18:13 robclark: I think just the last step changes, to gitlab permissions instead of account creation.. hmm, and I guess you just need to configure your ssh pub key in gitlab. Otherwise the process should be the same
18:16 abhinav__: robclark got it, Yes I have already uploaded my pub keys to my gitlab account ...
18:18 daniels: yeah, it would just be gitlab permissions so you don't need to fill out most of that form
18:18 daniels: robclark: afaik we only have stoney
18:20 abhinav__: daniels got it, so approvals will still happen on that form i assume though?
18:23 daniels: mripard wanted to try out the 'access request' button, but don't worry, we'll give you access :) and it should be moved early next week
18:32 abhinav__: daniels thanks :)
18:37 mareko: karolherbst: we could also make radeonsi use amdkfd to get system SVM, but it would need a new winsys
18:39 karolherbst: yeah... system SVM makes implementing all this stuff way easier, but I don't have anything which actually requires it
18:40 karolherbst: normal SVM is used by SyCL or chipstar (hip on CL), so that's why it's relatively important to support at some point
18:51 llyyr: https://gitlab.freedesktop.org/mesa/mesa/-/commit/9a00a360ad8bf0e32d41a8d4b4610833d137bb59 this commit breaks nearly every wayland native application that uses vulkan, including applications I launch with MESA_LOADER_DRIVER_OVERRIDE=zink
18:52 llyyr: can reproduce with mpv --gpu-api=vulkan (and plplay), as well as "MESA_LOADER_DRIVER_OVERRIDE=zink ffplay [video]"
19:03 llyyr: fixed by this diff https://0x0.st/H5t4.txt
19:03 llyyr: I'll open a MR if that looks right
19:09 ity: Hi, hopefully a quick question, does libglx only allow access to the GPU that the X11 Server is running on?
19:10 ity: Slash is there smth in the DRI protocol for choosing between GPUs that the X11 server is connected to
19:18 daniels: llyyr: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27767
19:18 llyyr: ah
19:28 agd5f: CounterPillow, we have access to all generations of hardware in general at least at engineering board level. OEM specific platforms or boards are a different matter.
19:28 CounterPillow: agd5f: then it is strange to me that I've seen "I don't have access to this hardware" as a response to something multiple times, referring to non-OEM models.
19:28 CounterPillow: There seems to be some breakdown in communication
19:29 agd5f: CounterPillow, do you have an example?
19:29 CounterPillow: No, I don't have links from 6 months ago handy
19:29 agd5f: CounterPillow, not every engineer has every board, but as a team, we have a hardware library where you can get the boards
19:32 agd5f: CounterPillow, that said, we have remote developers and it's not always feasible to send them one of every board so sometimes we need to reach out to someone in the office to repo issues, etc. which can take time
19:35 CounterPillow: agd5f: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9497#note_2248945 (not sure if this person is an AMD employee though)
19:41 CounterPillow: looks like they are
19:41 agd5f: CounterPillow, Thong is an AMD employee and he never said he didn't have access to the hardware.
19:46 DemiMarie: Do AMD kernel drivers have problems recovering from GPU resets?
19:48 CounterPillow: yes
19:48 agd5f: DemiMarie, they can, depending on the nature of the hang and hardware involved
19:48 DemiMarie: What is the reason for this?
19:49 DemiMarie: agd5f: will this be fixed in the future?
19:49 CounterPillow: I've never seen amdgpu recover successfully on either picasso or a 7900 XT or the zen4 igpu
19:49 llyyr: DemiMarie: just leave a h264 video playing with vaapi for 5-6 hours, you'll almost defintiely get a reset within that time on a RDNA 2/3 gpu
19:49 llyyr: a reset that it doesn't recover from, that is
19:50 agd5f: DemiMarie, It's mostly older hardware. newer stuff should be in pretty good shape
19:51 DemiMarie: For context: this makes supporting AMD GPUs in virtualization use-cases (with virtio-GPU native contexts) significantly less appealing.
19:51 DemiMarie: agd5f: is it possible to just do a full GPU reset, wipe out everything in VRAM, give everyone a context lost error, and continue?
19:51 agd5f: DemiMarie, yes
19:52 agd5f: but most userspace doesn't handle context lost so even if the kernel resets everything, userspace is left in a bad state
19:52 DemiMarie: agd5f: does that mean that the zen4 iGPU not recovering is a bug?
19:52 airlied: daniels: Linus has merged my tree, i can give you a week :-)
19:53 DemiMarie: agd5f: I see, so that is a bug in all sorts of userspace programs?
19:53 ccr: uhh.
19:53 DemiMarie:wonders if non-robust contexts should cause SIGABRT when a context loss happens
19:54 agd5f: DemiMarie, right. On other OSes, the desktop environment is robust aware and if it sees a context lost, it rebuilds it's state, creates a new context and continues
19:55 DemiMarie: agd5f: I guess that means that bugs should be reported against various Wayland compositors.
19:58 DemiMarie: agd5f: what is the status of LeftoverLocals mitigations for AMD GPUs?
19:58 agd5f: DemiMarie, in progress
20:00 DemiMarie: agd5f: does the hardware make it quite difficult?
20:00 DemiMarie: IIRC Google shipped something for ChromeOS.
20:00 DemiMarie: Will there be a way to enforce the mitigations at the kernel driver level?
20:02 agd5f: DemiMarie, I don't think I'm at liberty to discuss the details at this point
20:03 DemiMarie: agd5f: will the details be made available in the future?
20:03 agd5f: yes
20:06 DemiMarie: Context: I’m going to be working on GPU acceleration for Qubes OS and working LeftoverLocals protection, preferably at the kernel driver or firmware level, is a hard requirement there.
20:07 DemiMarie: The reason the location of the mitigations matters is that the userspace driver will be running in the guest, which is not trusted.
20:23 zamundaaa[m]: <CounterPillow> "I've never seen amdgpu recover..." <- I've seen it recover correctly lots of times with a 6800XT, and also once with a 7900XTX (only reset that has happened on it so far, triggered by Doom Eternal)
20:24 zamundaaa[m]: You just need to use the one compositor that supports recovering from GPU resets :)
20:24 zmike: weston ?
20:25 zamundaaa[m]: KWin
20:25 daniels: obviously weston uses the gpu so perfectly that we never need to recover
20:26 CounterPillow: zamundaaa[m]: I use KWin, and I don't think the problem was the compositor considering dmesg kept getting spammed with amdgpu trying to reset
20:28 zamundaaa[m]: I have seen a reset loop happen once before as well, on the 6800 XT. I thought that was fixed though, it hasn't happened in a while
20:38 agd5f: I have my doubts as to whether this stuff will ever work very reliably on consumer level Linux in general just due to the nature of the ecosystem. There are tons of distros and they all use slightly different combinations and versions of components and no one can reasonably test all of those, plus all of the OSVs and IHVs focus the vast majority of their testing on their enterprise offerings.
20:42 CounterPillow: Ah, the good ol' "Linux is too diverse to support" excuse when it's surprisingly always your component that crashes.
20:47 DemiMarie: agd5f: The only things that should matter here are the KMD version and the firmware version
20:47 DemiMarie: And the hardware itself, obviously.
20:48 zamundaaa[m]: Mesa can also matter. Until recently, reset handling in RadeonSi was borked
20:49 agd5f: DemiMarie, and the compositor version and the mesa version and the LLVM version
20:49 CounterPillow: The long-haired Linux smellies are simply asking too much of us when we have to make sure we don't have bugs in our firmware, our kernel driver, and our userspace driver
20:50 zamundaaa[m]: agd5f: KWin has supported GPU resets for a loooong time
20:50 zamundaaa[m]: And it's been 90% functional for almost always. In the remaining cases it would just crash, which is still better than a hang
20:50 DemiMarie: zamundaaa: What is the consequence of that? Applications not being able to deal with `VK_ERROR_DEVICE_LOST`/`GL_CONTEXT_LOST` reliably?
20:51 zamundaaa[m]: There were two issues, one was that RadeonSi never reported the GPU reset as being over
20:51 DemiMarie: agd5f: userspace should not determine whether the KMD can reset the GPU successfully
20:52 DemiMarie: agd5f: So LLVM problems can be solved by either having Mesa bundle LLVM, or by having Mesa stop using LLVM to generate AMD GPU code.
20:53 zamundaaa[m]: The other one was related to shared contexts, and meant that after re-creating OpenGL state, KWin would still get the context reset reported by RadeonSi on the new context, despite everything being fine
20:53 agd5f: CounterPillow, there are combinations of components that work great and others that do not. Say what you will about windows or android, it's a lot easier to test once and verify that it will work everywhere. It's not feasible to test every combination of driver, firmware, rest of kernel, UMD, LLVM, compositor, etc.
20:53 agd5f: DemiMarie, and we should also bundle kernel and mesa and firmware into one repo as well if we really want to get solid
20:53 zamundaaa[m]: agd5f: GPU reset handling on the application side is luckily very simple, so there isn't a lot of variation
20:54 zamundaaa[m]: It's pretty much just if (hasResetHappened()) recreateEglContexts()
20:54 DemiMarie: agd5f: from my PoV, the obvious solution to this is fuzzing
20:54 agd5f: I'm not talking about GPU reset specifically, just general GPU stack stability. Like you can have a good combination of KMD and firmware, but if UMD or bad, you'll just keep getting resets
20:55 DemiMarie: agd5f: what should distros do?
20:55 DemiMarie: always take the latest kernel and latest Mesa?
20:55 CounterPillow: not ship AMD code since they're the only ones with this recurrent quality problem
20:55 LaserEyess: you don't need to test every combination of driver, firmware, software. Test the upstream kernel and upstream mesa, and pick a DE to test, it doesn't matter
20:55 zamundaaa[m]: CounterPillow: comments like that really don't help
20:57 agd5f: LaserEyess, sure until distro X decides to pull in a new firmware or stick with an older mesa release, then you have an untested combination
20:58 LaserEyess: but that's not your problem, and if said distro is doing that then, well, they're doing something wrong
20:58 agd5f: LaserEyess, but that is what users use
20:59 LaserEyess: well I'm addressing the point of, for example, amdgpu bugs that are reproducible on drm-tip, or the stable linux kernel, or one of linus's -rc's
20:59 CounterPillow: Does breaking older user space with newer firmware count as an uapi break or is it fine because it's in firmware?
20:59 tleydxdy: who are these "users"?
21:00 tleydxdy: I doubt firmware change should affect anything beyond kmd
21:00 tleydxdy: if it did it's a kmd issue
21:12 DemiMarie: tleydxdy: those users are people using distros like Debian stable
21:12 tleydxdy: shouldn't they get support from debian?
21:12 tleydxdy: like amd is not in the position to do anything
21:13 tleydxdy: I would think the "users" for amd would be the upstream projects
21:13 tleydxdy: in that case there's only one support target: "tip of tree"
21:14 DemiMarie: but that has no humans actually using it, except for dev
21:14 robclark: DemiMarie: for LL bnieuwenhuizen made a mitigation that clears lmem in mesa.. configured via driconf. This is what we are shipping w/ CrOS but others are free to use it until we get something better from amd
21:14 tleydxdy: like the other reports help catch bugs that's good, but it's unrealistic to fully support them
21:15 tleydxdy: I can spin up a distro tmr that only ships known bad configs from vendor X and X would need to support my users?
21:15 DemiMarie: tleydxdy: “you have to be running this development version to get help” is not reasonable to expect from end-users
21:16 tleydxdy: yes, but amd also can't ship packages to debian stable
21:16 tleydxdy: so debian stable need to fix the issue
21:16 tleydxdy: not amd directly
21:16 tleydxdy: and if the fix is in upstream, they can backport
21:17 tleydxdy: "try latest upstream" is a reasonable ask of you are reporting issue to upstream
21:17 LaserEyess: DemiMarie: the distro is the user, for example ubuntu. When you get a bug on ubuntu, you report it to their issue tracker, and a developer there should be your primary PoC. That developer should be the one coming to AMD if it's an AMD bug, and that developer should be able to run a development system
21:17 LaserEyess: in fact people pay canonical for that service
21:19 tleydxdy: I mean if you are paying money that's a different story, whoever took your money should make sure you get fixed
21:19 tleydxdy: if you pay amd a contract then sure, run hanamontana os and get direct support
21:20 LaserEyess: I"m talking about the support contracts that many linux vendors offer
21:20 LaserEyess: amd does not offer support for those, the linux vendors do
21:20 LaserEyess: even free distros have bug trackers
21:20 LaserEyess: it's the same thing, just with volunteer time and not a contract
21:20 DemiMarie: tleydxdy: “latest released kernel and Mesa” would be something that is realistic to expect at least some users to run
21:20 DemiMarie: “tip of tree” isn’t
21:20 DemiMarie: not least because IIUC neither Linux nor Mesa actually recommend running it
21:23 tleydxdy: well tip of tree might be a poor choice of word for me. but I was pretty sure e.g. linux would want you try linux-next at least
21:23 tleydxdy: if you report bug directly to there
21:27 tleydxdy: in any case I don't think hardware vendors should concern themselves with anything other than the latest upstream projects (i.e. direct consumer of their code) when it comes to test coverage. unless they got support contracts that mandate otherwise of course
21:29 agd5f: ROCm is super stable running RHEL with our packaged drivers. In that case we can make sure you are using a well validated combination of firmwares, driver code, and core OS components because both AMD and RH test the hell out of it. fedora, less so.
21:33 tleydxdy: yeah, give money to rhel might be the end lesson here
21:35 DemiMarie: agd5f: From what I have seen, Intel is stable on Fedora, too.
21:37 DemiMarie: What this sounds like to me is that the various interfaces are unstable.
21:37 tleydxdy: I sure hope fedora gets tested otherwise would't every rhel update be a QA hell?
21:37 DemiMarie: tleydxdy: that kind of stuff is why Linux on the desktop has a bad reputation
21:37 DemiMarie: tleydxdy: to me, “latest upstream projects” means “latest release version”
21:40 DemiMarie: robclark: is this race-free? In other words, is it guaranteed that GPU preemption can’t happen before that command stream finishes?
21:41 robclark: it clears at the end of each shader, so as long as there isn't mid-shader preemption it should be ok
21:46 DemiMarie: Is mid-shader preemption guaranteed not to happen?
21:49 robclark: better question for someone from amd but I wouldn't expect mid-shader preemption
21:49 robclark: ie. seems like it would be a hard thing to implement in hw
21:49 agd5f: DemiMarie, on AMD hardware mid-shader preemption is only supported on the user queues used by ROCm. Kernel managed queues are not preempted
21:51 DemiMarie: agd5f: is one reason that ROCm queues can be preempted that they do not have access to fixed-function blocks?
21:51 agd5f: only compute queues support mid-shader preemption. GFX is always at draw boundaries
21:52 agd5f: due to fixed function hardware
22:23 DemiMarie: I see.
22:23 DemiMarie: Hopefully future hardware will support preemption of fixed-function units.
22:25 DemiMarie: Right now it seems that GFX is a second-class citizen when it comes to robustness.