00:19DemiMarie: robclark: in that case the proxy in the guest needs to adjust its behavior, rather than blindly forwarding message. That's a huge change.
08:27jfalempe: sima: Can you take a look at my drm_panic for i915/xe series https://patchwork.freedesktop.org/series/141936/ ? Do you know who I should ping to have it reviewed?
08:31airlied: mlankhorst: ^
08:49mlankhorst: lets see
09:09mlankhorst: jfalempe: I'm looking through the code, is there an alternative for using per-file static variables?
09:10mlankhorst: It should be doable to pass intel_fb for panic setup instead, and pass that along instead of bo
09:13mlankhorst: Contrary to i915, xe has a very strict separation between xe_bo and display, so adding a member to intel_fb for panic stuff would make sense (or put it in xe's i915_vma)
09:13jfalempe: mlankhorst: I find static variable was the least intrusive, and due to the callbacks, it's hard to get back the data you need.
09:18mlankhorst: drm_panic is only ever being called once right?
09:22jfalempe: Yes, but the get_scanout_buffer is called for each plane of each GPU.
09:24mlankhorst: Yeah so on each gpu each primary plane gets set. Do other planes get disabled?
09:25jfalempe: No, currently I just overwrite the primary plane.
09:29jfalempe: To remove the static variable, I would need to add a private pointer in struct drm_scanout_buffer, like https://patchwork.freedesktop.org/patch/595261/?series=133963&rev=1
09:31mlankhorst: Yeah that would be doable, put a panic_data struct into intel_display or intel_fb and done
09:35sima: yeah +1 on void *private; instead of sprinkling globals around
09:37jfalempe: mlankhorst: thanks, I will do that in the v10.
09:38sima: jfalempe, btw with the tiling disable, I guess for testing this means we get splats in the state verifier?
09:38sima: or do we avoid those?
09:38sima: or still no igt for panic testing?
09:39jfalempe: When using the drm test panic, the image is still tiled, so mostly unreadable, but there are no warning in dmesg.
09:40jfalempe: I think it restore tiling when doing the next flip
09:40sima: ah yeah we just smash the same thing in again
09:40sima: I guess ideally we'd have special intel igt cases for all the special code (like the y/4 tiling)
09:41sima: otherwise this stuff will break
09:42mlankhorst: I thought there is the tiling handling now?
09:42jfalempe: Testing is pretty hard, due to the assumption we have to do for drm panic.
09:42mlankhorst: Lets see what happens if I put it on my machine and do the debugfs test O:)
09:43jfalempe: mlankhorst: when DPT is not used, I disable tiling (so on older hw), when DPT is set, I have to handle the tiling.
09:43mlankhorst: Yeah exactly
09:44mlankhorst: uapi: [FB:380] AR30 little-endian (0x30335241),0x100000000000009,1920x2160, visible=visible, src=1920.000000x2160.000000+0.000000+0.000000, dst=1920x2160+0+0, rotation=0 (0x00000001)
09:45sima: mlankhorst, I'm talking about testcases in igt
09:45sima: otherwise this stuff will break
09:45mlankhorst: sima: Yeah should be easy to add, only thing we can do is verify no errors occurred from hw, no idea if the results would look sane. :)
09:45sima: yeah can't validate the output, but at least make sure it doesn't blow up
09:45sima: which is like the most important thing of panic code
09:46jfalempe: Also the debug interface doesn't allow to see if you miss some "flush" commands, because you will continue to do page flips. (for example, I only noticed the psr2 issue with real panic).
09:46sima: jfalempe, yeah there's limitations, but we can at least exercise the code and make sure we run all the paths in CI
09:46mlankhorst: igt_display is a reasonably sane interface for testing
09:47sima: and if we go with an igt ideally it'd have a super basic one that uses dumb buffers
09:47sima: so that other drivers also have some automated testing for those that do have igt ci
09:47mlankhorst: thought I would also be interested into testing what happens with partial primary plane, etc
09:49jfalempe: yes, I only tested the basics, so panic in gnome desktop and in VT.
09:51mlankhorst: Or like Xorg, too big a primary plane, with parts not visible
09:51jfalempe: Also discrete GPU won't work yet, because I don't have a way to ioremap from the panic handler. But that can be added later, and iGPU are the vast majority in use.
09:55mlankhorst: There's no guarantee on small bar that ioremap will even succeed
09:58mlankhorst: should not be needed to ioremap
10:00sima: amd has peek/poke registers, dead slow but guaranteed to work
10:00sima: but I think we don't have those?
10:02kode54: I had fun when I tried to read a panic qr code from someone's web posting of a video capture output
10:02kode54: seems Apple's iOS QR code decoder chokes on that much data
10:03kode54: it tries to inline it into the tooltip that pops on the screen to let you open/select it
10:08jfalempe: kode54: Yes some QR code reader don't like bigger code. I use Binary eye on Android, and it works good (and is open-source https://github.com/markusfisch/BinaryEye)
10:09kode54: I think Apple's coder can handle it
10:09kode54: but they try to stuff the entire block of data into a tooltip control
10:11jfalempe: hum, yes that's a lot of digits, because they can be encoded efficiently in QR code and are valid URL. Using Base64 would be much shorter, but also waste 25% of data.
10:15kode54: I thought the QR code from the panic was the full text of the panic message?
10:15kode54: or does it also include the dmesg loop?
10:15kode54: since I've never had a panic successfully bring up the qr code
10:16kode54: every time I have a panic, it's a gpu related panic, and all my monitors cease outputting a picture when that happens :D
10:16psykose: i had it once and thought i got hacked because of the screen clear animation :-)
10:16kode54: though there was that period where I was testing bcachefs
10:17kode54: can't test GPU code at the same time as bcachefs, as bcachefs is still in a state of flux in current kernels, so down/upgrading will lead to applying down/upgrade to your filesystem
10:27mlankhorst: jfalempe: poking around, is it possible to include the private member for testing? Makes things easier. :)
10:30jfalempe: mlankhorst: I'll try to do the change this afternoon, and send a v10.
10:31jfalempe: kode54: the dmesg data are compressed, so you can put more of them. if you compile with QR_CODE set and QR_CODE_URL unset, you will have the dmesg in clear text in the QR code.
10:31kode54: gotcha
11:57mlankhorst: jfalempe: Yeah might make vram easier.
12:02mlankhorst: jfalempe: https://paste.debian.net/1380245/ btw if you want to spend some time on vram too
12:05mlankhorst: remove the xe_ttm_vram, was from original code
12:07mlankhorst: though need to put xe_ttm_resource_visible () check somewhere
12:08jfalempe: mlankhorst: thanks nice to have also a solution for vram, the problem is I don't have hardware to test. The only discrete intel card I have is a dg2 prototype.
12:10mlankhorst: I do
12:11mlankhorst: DG2 mostly runs on xe btw if you use force probe, only HuC doesn't load
13:11mlankhorst: hmm I added some debug for testing, it's way too realistic
13:12mlankhorst: https://i.imgur.com/c9nWOZ0.png
13:12mlankhorst: (tiled screen)
13:13jfalempe: I was wondering how you can panic only half of your display :)
13:15jfalempe: red screen of death, really frightening.
13:15mlankhorst: I think it's not completely working, but at least the mapping part does
13:16mlankhorst: https://paste.debian.net/1380313/
13:17mlankhorst: should probably export a function to get the vram base pointer and create or look for a xe_bo_mappable()
13:18mlankhorst: but it worked on my battlemage
13:23sima: mlankhorst, we really don't have peek/poke registers anywhere?
13:24mlankhorst: I haven't looked for them
13:24mlankhorst: But this works on everthing except small bar alread
13:29sima: yeah I guess assuming we have a big enough bar isn't that stupid any more
13:31mlankhorst: though I didn't check what happens if we keep the FB in stolen memory on recent hw
13:33mlankhorst: I would say it should still work, but that's always a special case
14:03jfalempe: mlankhorst: I've done the change to remove the static variables, moving them to struct intel_framebuffer. I won't have time clean and send that today, so I will finish it tomorrow.
14:06mlankhorst: No worries, feel free to re-use the work I did for VRAM
14:07jfalempe: my current test patch: https://paste.debian.net/1380319/
18:35zmike: mareko: if you want to re-ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35477