00:19 DemiMarie: robclark: in that case the proxy in the guest needs to adjust its behavior, rather than blindly forwarding message. That's a huge change.
08:27 jfalempe: sima: Can you take a look at my drm_panic for i915/xe series https://patchwork.freedesktop.org/series/141936/ ? Do you know who I should ping to have it reviewed?
08:31 airlied: mlankhorst: ^
08:49 mlankhorst: lets see
09:09 mlankhorst: jfalempe: I'm looking through the code, is there an alternative for using per-file static variables?
09:10 mlankhorst: It should be doable to pass intel_fb for panic setup instead, and pass that along instead of bo
09:13 mlankhorst: Contrary to i915, xe has a very strict separation between xe_bo and display, so adding a member to intel_fb for panic stuff would make sense (or put it in xe's i915_vma)
09:13 jfalempe: mlankhorst: I find static variable was the least intrusive, and due to the callbacks, it's hard to get back the data you need.
09:18 mlankhorst: drm_panic is only ever being called once right?
09:22 jfalempe: Yes, but the get_scanout_buffer is called for each plane of each GPU.
09:24 mlankhorst: Yeah so on each gpu each primary plane gets set. Do other planes get disabled?
09:25 jfalempe: No, currently I just overwrite the primary plane.
09:29 jfalempe: To remove the static variable, I would need to add a private pointer in struct drm_scanout_buffer, like https://patchwork.freedesktop.org/patch/595261/?series=133963&rev=1
09:31 mlankhorst: Yeah that would be doable, put a panic_data struct into intel_display or intel_fb and done
09:35 sima: yeah +1 on void *private; instead of sprinkling globals around
09:37 jfalempe: mlankhorst: thanks, I will do that in the v10.
09:38 sima: jfalempe, btw with the tiling disable, I guess for testing this means we get splats in the state verifier?
09:38 sima: or do we avoid those?
09:38 sima: or still no igt for panic testing?
09:39 jfalempe: When using the drm test panic, the image is still tiled, so mostly unreadable, but there are no warning in dmesg.
09:40 jfalempe: I think it restore tiling when doing the next flip
09:40 sima: ah yeah we just smash the same thing in again
09:40 sima: I guess ideally we'd have special intel igt cases for all the special code (like the y/4 tiling)
09:41 sima: otherwise this stuff will break
09:42 mlankhorst: I thought there is the tiling handling now?
09:42 jfalempe: Testing is pretty hard, due to the assumption we have to do for drm panic.
09:42 mlankhorst: Lets see what happens if I put it on my machine and do the debugfs test O:)
09:43 jfalempe: mlankhorst: when DPT is not used, I disable tiling (so on older hw), when DPT is set, I have to handle the tiling.
09:43 mlankhorst: Yeah exactly
09:44 mlankhorst: uapi: [FB:380] AR30 little-endian (0x30335241),0x100000000000009,1920x2160, visible=visible, src=1920.000000x2160.000000+0.000000+0.000000, dst=1920x2160+0+0, rotation=0 (0x00000001)
09:45 sima: mlankhorst, I'm talking about testcases in igt
09:45 sima: otherwise this stuff will break
09:45 mlankhorst: sima: Yeah should be easy to add, only thing we can do is verify no errors occurred from hw, no idea if the results would look sane. :)
09:45 sima: yeah can't validate the output, but at least make sure it doesn't blow up
09:45 sima: which is like the most important thing of panic code
09:46 jfalempe: Also the debug interface doesn't allow to see if you miss some "flush" commands, because you will continue to do page flips. (for example, I only noticed the psr2 issue with real panic).
09:46 sima: jfalempe, yeah there's limitations, but we can at least exercise the code and make sure we run all the paths in CI
09:46 mlankhorst: igt_display is a reasonably sane interface for testing
09:47 sima: and if we go with an igt ideally it'd have a super basic one that uses dumb buffers
09:47 sima: so that other drivers also have some automated testing for those that do have igt ci
09:47 mlankhorst: thought I would also be interested into testing what happens with partial primary plane, etc
09:49 jfalempe: yes, I only tested the basics, so panic in gnome desktop and in VT.
09:51 mlankhorst: Or like Xorg, too big a primary plane, with parts not visible
09:51 jfalempe: Also discrete GPU won't work yet, because I don't have a way to ioremap from the panic handler. But that can be added later, and iGPU are the vast majority in use.
09:55 mlankhorst: There's no guarantee on small bar that ioremap will even succeed
09:58 mlankhorst: should not be needed to ioremap
10:00 sima: amd has peek/poke registers, dead slow but guaranteed to work
10:00 sima: but I think we don't have those?
10:02 kode54: I had fun when I tried to read a panic qr code from someone's web posting of a video capture output
10:02 kode54: seems Apple's iOS QR code decoder chokes on that much data
10:03 kode54: it tries to inline it into the tooltip that pops on the screen to let you open/select it
10:08 jfalempe: kode54: Yes some QR code reader don't like bigger code. I use Binary eye on Android, and it works good (and is open-source https://github.com/markusfisch/BinaryEye)
10:09 kode54: I think Apple's coder can handle it
10:09 kode54: but they try to stuff the entire block of data into a tooltip control
10:11 jfalempe: hum, yes that's a lot of digits, because they can be encoded efficiently in QR code and are valid URL. Using Base64 would be much shorter, but also waste 25% of data.
10:15 kode54: I thought the QR code from the panic was the full text of the panic message?
10:15 kode54: or does it also include the dmesg loop?
10:15 kode54: since I've never had a panic successfully bring up the qr code
10:16 kode54: every time I have a panic, it's a gpu related panic, and all my monitors cease outputting a picture when that happens :D
10:16 psykose: i had it once and thought i got hacked because of the screen clear animation :-)
10:16 kode54: though there was that period where I was testing bcachefs
10:17 kode54: can't test GPU code at the same time as bcachefs, as bcachefs is still in a state of flux in current kernels, so down/upgrading will lead to applying down/upgrade to your filesystem
10:27 mlankhorst: jfalempe: poking around, is it possible to include the private member for testing? Makes things easier. :)
10:30 jfalempe: mlankhorst: I'll try to do the change this afternoon, and send a v10.
10:31 jfalempe: kode54: the dmesg data are compressed, so you can put more of them. if you compile with QR_CODE set and QR_CODE_URL unset, you will have the dmesg in clear text in the QR code.
10:31 kode54: gotcha
11:57 mlankhorst: jfalempe: Yeah might make vram easier.
12:02 mlankhorst: jfalempe: https://paste.debian.net/1380245/ btw if you want to spend some time on vram too
12:05 mlankhorst: remove the xe_ttm_vram, was from original code
12:07 mlankhorst: though need to put xe_ttm_resource_visible () check somewhere
12:08 jfalempe: mlankhorst: thanks nice to have also a solution for vram, the problem is I don't have hardware to test. The only discrete intel card I have is a dg2 prototype.
12:10 mlankhorst: I do
12:11 mlankhorst: DG2 mostly runs on xe btw if you use force probe, only HuC doesn't load
13:11 mlankhorst: hmm I added some debug for testing, it's way too realistic
13:12 mlankhorst: https://i.imgur.com/c9nWOZ0.png
13:12 mlankhorst: (tiled screen)
13:13 jfalempe: I was wondering how you can panic only half of your display :)
13:15 jfalempe: red screen of death, really frightening.
13:15 mlankhorst: I think it's not completely working, but at least the mapping part does
13:16 mlankhorst: https://paste.debian.net/1380313/
13:17 mlankhorst: should probably export a function to get the vram base pointer and create or look for a xe_bo_mappable()
13:18 mlankhorst: but it worked on my battlemage
13:23 sima: mlankhorst, we really don't have peek/poke registers anywhere?
13:24 mlankhorst: I haven't looked for them
13:24 mlankhorst: But this works on everthing except small bar alread
13:29 sima: yeah I guess assuming we have a big enough bar isn't that stupid any more
13:31 mlankhorst: though I didn't check what happens if we keep the FB in stolen memory on recent hw
13:33 mlankhorst: I would say it should still work, but that's always a special case
14:03 jfalempe: mlankhorst: I've done the change to remove the static variables, moving them to struct intel_framebuffer. I won't have time clean and send that today, so I will finish it tomorrow.
14:06 mlankhorst: No worries, feel free to re-use the work I did for VRAM
14:07 jfalempe: my current test patch: https://paste.debian.net/1380319/
18:35 zmike: mareko: if you want to re-ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35477