01:14anarsoul: enunes: for https://gitlab.freedesktop.org/mesa/mesa/-/issues/8262 do you have a dump that reproduces the issue?
01:15anarsoul: I really wonder how texture descriptors look for these 4096x4096 textures
10:16enunes: anarsoul: I can share one later, I looked at the descriptor but didn't spot anything wrong so far
10:17enunes: it can also be the viewport descriptor since it writes to that attached to a fbo too
15:49anarsoul: enunes: it's a fault on read. So it's either a varying, uniform or texture
16:50anarsoul: do we have any tool to decode the dump?
16:51anarsoul: https://gitlab.freedesktop.org/lima/lima_dump just replays the dump
17:21enunes: anarsoul: don't know, I noticed that too and decided to just replay the trace and use the dump env var for a decoded dump
17:21anarsoul: what is dump env var?
17:23enunes: LIMA_DEBUG=dump
17:32anarsoul: enunes: but how does it work? lima_dump replay doesn't seem to use mesa
17:33enunes: I meant I replayed the apitrace, not the lima_dump dump
17:35anarsoul: oh
17:35anarsoul: that may not be the same, it'd make sense to replay (or decode) failed job
17:36anarsoul: I guess it shouldn't be difficult to adapt https://gitlab.freedesktop.org/lima/mali-syscall-tracker to decode dumps
17:36enunes: I agree but as I posted there, even if replaying the trace doesn't crash, it renders some garbage, from what I have seen so far basically due to the same bug
17:37anarsoul: interesting
17:37anarsoul: does it actually use 4096x4096 textures?
17:39enunes: no, seems to be 4096x32
17:39enunes: these seem to be two interesting frames (search for 4096): https://paste.centos.org/view/274844ed https://paste.centos.org/view/b4194d52
17:40enunes: now I noticed that the first is a blit, and running with LIMA_DEBUG=noblit seems to avoid the bug as well, so it might be something with the lima blitter, the scissor is not matching in that blit too
17:50anarsoul: format is LIMA_TEXEL_FORMAT_RGBA_8888
17:51anarsoul: enunes: it might be a missing dependency somewhere?
17:51anarsoul: if it gets fixed by limiting width, the job is completed faster and its resources aren't released while it's still in fly
17:53enunes: with LIMA_DEBUG=noblit it can even run with regular 4096 max texture, and the blit job should take longer, so if anything it needs to be related to the blit cmd, no?
17:53enunes: is it ok that the scissor is 2048 and the blit is 4096?
17:54anarsoul: enunes: I don't see why not?
17:55anarsoul: I don't see VS or PLBU commands in the dump
17:55enunes: we have a comment saying /* Scissored blit isn't implemented yet */ and it should fall back to u_blitter but seems like it may be continuing in some case and miscalculating the scissor cmd for the blit
17:55anarsoul: scissor and viewport should be clamped by fb size though
17:58anarsoul: enunes: it uses scissor for handling dst box
18:01anarsoul: enunes: can you post complete dump?
18:02enunes: its hundreds of frames, the first one I posted is complete, the second I cropped because pastebin complained it's too big, but I think the rest is uninteresting
18:05anarsoul: I see
18:15anarsoul: I suspect some dependency is not handled in lima_blit
18:15anarsoul: these are hard to debug
18:45anarsoul: enunes: looks like 0x6186000 VA of 4096x32 texture from 2nd paste is referenced in pp frame in the 1st
18:47anarsoul: I assume it's the same context, and lima_do_blit does add the job to ctx->write_jobs
19:27enunes: anarsoul: I still think it's something simpler than that... just hacking the blit scissor size to follow size of the fb seems to avoid the crash as well
19:27anarsoul: enunes: but it likely breaks the blitter
19:27anarsoul: :)
19:28anarsoul: changing scissor size changes timings, so I'd still bet on dependency issue
21:50anarsoul: enunes: out of curiosity, does it work correctly if you make jobs synchronous?
21:51anarsoul: if you enable dumping, it should be synchronous
21:52anarsoul: so do you see the issue with LIMA_DEBUG=dump?
21:56anarsoul: btw, flushing blit job immediately seems to be suboptimal
21:56anarsoul: but I don't really remember why I did it like that
21:57anarsoul: in theory dropping lima_do_job() from lima_do_blit() should work just fine
22:06enunes: anarsoul: singlejob or dump seem to make no difference, reproduces with or without
22:06anarsoul: then it's likely not a race
22:07enunes: interestingly nobocache makes it crash on my system too rather than just rendering garbage
22:07anarsoul: hmm
22:07anarsoul: sounds like a missing dep?
22:07anarsoul: i.e. some job is using a BO, but it wasn't added to a job
22:09anarsoul: but it should give you a clue on what it's attempting to reuse
22:10anarsoul: if it's a gpu crash just compare fault VA with dump