01:49 zmike: tarceri: wow you went through a battle to finish off that series
03:29 tarceri: yeah lol, it turned out reasonably clean in the end but the mixing on nir and the param stuff is a bit yuck
03:29 tarceri: *mixing of
07:37 mlankhorst: airlied: ah good thanks
07:39 tomba: I'm debugging weston crash (assert) when running with tidss. If I have this right, when tidss enables the crtc, drm_crtc_vblank_on() is called, and that leads to drm_reset_vblank_timestamp(), which does "t_vblank = 0;". tidss sends a vblank event with timestamp 0, and weston asserts that the new timestamp is bigger than the old one, and the new one in this case is 0, so smaller than the old one.
07:40 tomba: I can't figure out if this is a driver issue (but afaics it comes from the drm framework), or if weston should take timestamp 0 as invalid and ignore it.
07:45 tomba: I can make it work if I just remove the "t_vblank = 0;" part, as then a value from ktime_get() will be used. However, then weston reports "unexpectedly large timestamp jump (from 7646429 to 7664910)" when unblanking the display. So... weston expects the timestamp to only run when the display is enabled?
08:05 pq: tomba, there are no "sometimes invalid" timestamps in KMS to my belief.
08:06 tomba: pq: the kernel has a comment "assign 0 for now, to mark the vblanktimestamp as invalid".
08:07 pq: tomba, the "unexpectedly large jump" is less of a problem. I don't recall exactly the conditions for that.
08:07 tomba: but maybe that 0 not supposed to go to the userspace. here it seems to go, though.
08:08 tomba: pq: I think it's just that before display blank, there are flip events with certain timestamps. then the display is blanked for a time, and when enabled, there's a flip event with much higher timestamp. so the diff between the previous and next timestamp is "unexpectely large". but, I have to say, I'm just guessing on what weston does here, I haven't looked.
08:09 pq: tomba, sounds reasonable.
08:10 pq: I don't think Weston ever blanks the display while waiting for a pageflip event. It cannot, really, since blanking is an atomic commit and userspace cannot commit until the previous commit has completed.
08:12 pq: Weston does have some code to try and fetch the latest vblank timestamp when it is preparing for an atomic commit after a pause.
08:13 pq: Weston wants the latest timestamp, so that it can predict when the next commit would complete, and to know if it can commit in time to meet the very next vblank.
08:13 tomba: The assert I see when enabling the display is "weston_output_finish_frame: Assertion `timespec_sub_to_nsec(stamp, &output->frame_time) >= 0' failed.".
08:14 tomba: pq: yep. I think that makes sense. but for some reason the kernel sets the timestamp to 0.
08:14 pq: yes, these are possibly to different problems. The timestamp going backwards should never happen from userspace perspective.
08:14 pq: *two
08:15 tomba: but only when the driver does drm_crtc_vblank_on(). then when we do "real" page flips, the timestamp is there.
08:15 pq: maybe the query for the latest vblank timestamp is somehow botched while the display is off?
08:16 tomba: it's just a ktime_get(). the timestamp is there, but because it's not a "high precision timestamp", the framework sets it to 0
08:19 zamundaaa[m]: <pq> "tomba, there are no "sometimes..." <- Unfortunately in practice, things are different
08:20 zamundaaa[m]: amdgpu has a bug where it sends 0 for some pageflips, and out of tree drivers like the one for displaylink and the proprietary NVidia driver always send 0
08:21 pq: tomba, the "get latest timestamp" code is at https://gitlab.freedesktop.org/wayland/weston/-/blob/14.0/libweston/backend-drm/drm.c?ref_type=heads#L846 and till the end of that function.
08:22 pq: zamundaaa[m], should report bugs on those.
08:26 pq: tomba, I think Weston does expect vblank timestamps to make progress during CRTC being off, but it also has code to deal with the timestamp being invalid for the query. The timestamp cannot be invalid for an actual flip though, even for the first flip on turning a CRTC on.
08:32 tomba: pq: yep. I think that's reasonable. I can't figure out why the drm framework sets the timestamp to 0 on drm_crtc_vblank_on() call, and "reinitialize delayed at next vblank interrupt". I don't see how the ktime_get() timestamp is better at the vblank interrupt =)
08:33 pq: I guess vblank_on can be called at any phase of the scanout cycle, while the interrupt happens at a much better point of the cycle.
08:33 pq: *better known
08:34 pq: assuming the hardware continues to scan out even while vblank tracking is off
08:35 tomba: hmm perhaps... on the TI hardware, when we enable the crtc (and call drm_crtc_vblank_on()), the scanout cycle starts. so it's the zero point, so clear phase in the cycle.
08:36 pq: right, I think the difference is that a driver could disable vblank tracking even while the CRTC is scanning out, to conserve power when the interrupts are not necessary.
08:37 pq: so in that situation, it makes sense to wait for the interrupt when turning the interrupts on again, if the driver cannot determine the timestamp otherwise
08:37 tomba: yes, but I think that's different, it's handled with drm_crtc_vblank_get/put
08:37 pq: hmm, ok
08:38 pq: unfortunately I have no idea of the DRM kernel internals really
08:39 pq: so for your hardware, and for turning a CRTC on, you would really want to ktime_get() at the crtc enable time.
08:39 pq: can't you override the DRM core/helper behavior to achieve that?
08:40 pq: tomba, I hope I could give at least some clues here. Good hunting. :-)
08:41 tomba: pq: well, I can just comment out the t_vblank = 0; line =) and yes, I could reimplement drm_crtc_vblank_on(), but that migth be quite a lot. and to be honest, I don't see anything special with the TI hardware here, I would assume many other display subsystems behave the same. which makes me wonder what's wrong with tidss driver =)
08:49 zamundaaa[m]: <pq> "zamundaaa, should report bugs on..." <- For the proprietary NVidia driver, it's not possible to fix it afaik, something about missing symbols
08:50 zamundaaa[m]: amdgpu I of course filed a bug report already. What I'm saying is that it's best for compositors not to assert when it happens
10:51 glehmann: jenatali: are you sure this code doesn't want to use -FLT_MAX? https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/microsoft/compiler/dxil_nir_lower_int_samplers.c#L433
12:30 glehmann: dj-death: congrats on landing ESO
12:46 dj-death: glehmann: thanks
13:23 cmeissl[m]: I am trying to switch from mali to panfrost and while most things worked right away direct scan-out of wayland clients stopped working. It looks like creating the dumb buffer fails with EACCES. From what I can tell debugging the calls in mesa the kms_fd points to a render node of the display controller node, which is set by the wayland platform when receiving the default feedback. The ioctl DRM_IOCTL_MODE_CREATE_DUMB then fails because
13:23 cmeissl[m]: DRM_RENDER_ALLOW is not set and the device node is a render node.
13:24 cmeissl[m]: As a quick test I patched and re-compiled the kernel with DRM_RENDER_ALLOW for the dumb buffer ioctls.
13:24 jenatali: glehmann: I think you're right
13:25 cmeissl[m]: With DRM_RENDER_ALLOW direct scan-out works again, but that doesn't sound right?
13:39 daniels: cmeissl[m]: presumably you're using a downstream kernel where either the mediatek or rockchip driver is patched to declare DRIVER_RENDER, despite not being able to do any rendering. nothing in the wider stack expects this. just patching that out should fix your problem completely.
13:42 cmeissl[m]: daniels: yeah, downstream kernel from mediatek (based on genio 510). already had to backport some stuff around AFBC, so I am not surprised that it might need more changes. will give it a try, thanks!
13:44 daniels: np
14:41 cmeissl[m]: jup, removing DRIVER_RENDER worked, thanks again :)
14:41 cmeissl[m]: btw, awesome work from everybody, especially for supporting the NV12 MTK Tiled format!
18:17 mlankhorst: airlied: looking a bit at your cgroup series, doesn't look too different from what I was originally testing with xe against dmemcg