00:22friendlyinvader[d]: redsheep[d]: I mean it's not corrupt in that case, it's just a duplicate block, but that shouldn't have any impact
01:32airlied[d]: karolherbst[d]: skeggsb9778[d] gfxstrand[d] so nvk and nvc0 seem to program index buffer size differently, nvk seems to just do range, but nvc0 seems to do end addr
01:33airlied[d]: maybe it changed on turing?
01:41gfxstrand[d]: I think nvc0 is just wrong
01:41gfxstrand[d]: But it also may have changed at some point
01:42gfxstrand[d]: NVK is correct in any case
01:45gfxstrand[d]: Yeah, it changed on Turing
01:45gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_cmd_draw.c?ref_type=heads#L2570
01:57airlied[d]: skeggsb9778[d]: I think the bug I'm seeing with f40 is actually this but worse /* XXX: getting a page fault at the end of the code buffer every few
01:57airlied[d]: * launches, don't use the last 256 bytes to work around them - prefetch ?
01:57airlied[d]: */
01:58airlied[d]: going to try increasing that 256 to 4096 and see if It happens again
02:08skeggsb9778[d]: oooh, interesting
02:08skeggsb9778[d]: that's likely simpler to fix if it's it
02:18airlied[d]: nvk seems to overalloc by 4096 for heap
02:27airlied[d]: karolherbst[d]: skeggsb9778[d] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29722
02:31skeggsb9778[d]: Nice!
06:56godvino[d]: karolherbst[d]: Hey I tested my device with the NVIDIA proprietary drivers and the gpu doesn't power down with it too.
06:56godvino[d]: I think its the bios/firmware that's powering it up.
06:57godvino[d]: Definitely not a nouveau bug.
07:14tiredchiku[d]: fwiw the gpu suspends fine for me on all driver configurations
07:31karolherbst[d]: godvino[d]: yeah.. that's what my "weird interrupt stuff" comment was about. But it's still weird, the GPU being powered up through the firmware is something else than the kernel setting the power state. It might as well be a core linux bug, not something nouveau or nvidia can do anything about
07:31karolherbst[d]: this entire GPU suspend/resume thing is pretty complicated
07:47godvino[d]: What's weird is nouveau actually manages to power down the gpu for a splitsecond but it gets powered up again by the bios/firmware/whatever
07:48godvino[d]: Also noticed pressing fn+q (thermal mode switcher on lenovo devices) causes the gpu to power up
07:48godvino[d]: Any idea how to stop nouveau from trying to power down the gpu ?
07:49godvino[d]: Atleast I wouldn't have to experience the microstutters
07:49karolherbst[d]: `nouveau.runpm=0`
07:49karolherbst[d]: but
07:49karolherbst[d]: you can also whack the `power/control` file in the sysfs device node
07:49karolherbst[d]: just write `on` instead of `auto` into it
07:50karolherbst[d]: but yeah.. nouveau enables it on boot itself
09:57ahuillet[d]: airlied[d]: this may be tied to NVC997_SET_PIPELINE_PROGRAM_PREFETCH on Ampere+
10:14ahuillet[d]: or maybe not, i am actually not sure, but I found the padding to use (2048). left a comment on the MR
11:19karolherbst[d]: ahuillet[d]: ohh, that reminds me. We have troubles calculating how many GRPs we have to assign for Volta+. As of now we just assign the highest gpr accessed + 3 (count + 2) and we were wondering what's up with that.
11:19karolherbst[d]: current working theory is that +2 is needed for ugprs or something
11:19karolherbst[d]: but I've seeh shaders working with a lower count
11:23ahuillet[d]: *GPRs not GRP?
11:23ahuillet[d]: 'cause I have no idea what a GRP is :)
11:24karolherbst[d]: ehh yeah, GPR
11:24ahuillet[d]: seriously though, I'll go dig
11:45ahuillet[d]: (resolved in DM, for the audience here: +2 is the correct thing to use on Volta+)
11:46karolherbst[d]: so nothing to do for us :ferrisUpsideDown:
13:31karolherbst[d]: gfxstrand[d]: do you know a reason why one shouldn't use `nir_address_format_32bit_global` for `.phys_ssbo_addr_format`? If there is none, we could probably get rid of the `global_2x32` stuff in mesa
13:32karolherbst[d]: `v3dv` is the only user and they ignore the second component entirely
13:32karolherbst[d]: and using `global` instead of `global_2x32` makes everything a lot simpler for me
13:40gfxstrand[d]: No reason I can think of
13:41karolherbst[d]: cool
13:41karolherbst[d]: I'm gonna remove tons of code :ferrisUpsideDown:
13:43gfxstrand[d]: Yay
13:44karolherbst[d]: I'm porting `v3d` and `v3dv` to `nir_lower_mem_access_bit_sizes` for CL reasons, and instead of supporting `global_2x32` there, I was thinking: if they just ignore the 2nd component/treat it as 0, I might as well remove it :ferrisUpsideDown:
13:48karolherbst[d]: though we also have `global_ir3` which is the same as `2x32` just I think freedreno actually cares about the high bits? But they also have an offset on top of the vec2...
13:48karolherbst[d]: oh well..
13:51gfxstrand[d]: I thought that was one where it was 64-bit but without 64-bit ALU
13:51gfxstrand[d]: Vc4 is 32-bit all the way, though.
13:51karolherbst[d]: yeah
13:52karolherbst[d]: but the `_etna` variant is vec2 , but they have an additional offset source
13:52karolherbst[d]: same as AMD, just that being 64 bit address + 32 bit offset
13:52gfxstrand[d]: So the thing that might burn you here is buffer device address. That needs to be 64-bit(ish)
13:52karolherbst[d]: ehh
13:52karolherbst[d]: `_ir3` also being vec2 + offset
13:52karolherbst[d]: etna being 32 + 32
13:53karolherbst[d]: gfxstrand[d]: sure, but the lowering would still insert a 0 on the high bit? mhh.. maybe I should run the CTS tests on that as well
13:54gfxstrand[d]: Yes, the top 32 bits will always be zero in that case.
13:54karolherbst[d]: yeah, that's fine
13:54karolherbst[d]: the backend compiler of `v3dv` ignored the high bits
13:54karolherbst[d]: sooo...
13:54gfxstrand[d]: We can add an address mode for that if necessary
13:54karolherbst[d]: I mean.. I can keep the `2x32` code around if somebody really wants to
13:55gfxstrand[d]: Like, one that always does pack/unpack and 32-bit ALU
13:55karolherbst[d]: yeah...
13:55gfxstrand[d]: But I don't see a point to a vec2 mode
13:55karolherbst[d]: same
13:55karolherbst[d]: especially when the only user ignore .y
13:56gfxstrand[d]: I'm not even sure if any faking is required. I'd have to dig into what vtn does
13:56gfxstrand[d]: Could be that everything "just works"
13:58karolherbst[d]: uhh.. I run into spirv parsing issues :blobcatnotlikethis:
13:59karolherbst[d]: `dEQP-VK.binding_model.buffer_device_address.set0.depth1.baseubo.convertuvec2.nostore.single.std140.comp` mhh
13:59karolherbst[d]: `Source (%81) and destination (%86) of OpBitcast must have the same total number of bits`
14:00gfxstrand[d]: That's what I was afraid of
14:00gfxstrand[d]: Let's add a fake 64-bit mode that has nicely defined semantics the do what we want.
14:01karolherbst[d]: I was considering using `nir_address_format_32bit_offset_as_64bit`, but uhm... there are asserts around to not use it for global I think
14:03karolherbst[d]: or maybe we should just relax that and trust users to use it with care :ferrisUpsideDown:
14:04karolherbst[d]: but I think the `NULL` pointer semantics are different there?
14:07karolherbst[d]: ehh yeah.. I've seen some of the code, not a great idea
14:23karolherbst[d]: okay, this wasn't too hard
14:32karolherbst[d]: maybe I should use `32bit_offset_as_64bit` in rusticl as well and just simplify a lot of the code by wasting a bit of space for pointers..
14:32karolherbst[d]: being able to treat every device as 64 bit would simplify a lot
14:42gfxstrand[d]: karolherbst[d]: Yeah, that makes me nervous.
14:42karolherbst[d]: same
14:42gfxstrand[d]: offsets are not fundamentally the same thing as addresses.
14:42karolherbst[d]: I've already added the code anyway
14:42gfxstrand[d]: a 32bit_addr_as_64bit or somesuch?
14:43karolherbst[d]: yeah
14:43karolherbst[d]: now I just have to check if `v3dv` is happy with the code or not 😄
14:47gfxstrand[d]: 😄
14:54asdqueerfromeu[d]: karolherbst[d]: How many Pi 4s do you have lying around? 🥧
14:55karolherbst[d]: asdqueerfromeu[d]: 2
14:55karolherbst[d]: I also have a pi5
14:56karolherbst[d]: uhhhh
14:59karolherbst[d]: gfxstrand[d]: with my fancy new `32bit_global_as_64bit` drivers need to implement `i2i64`, so that's kinda annoying
15:23gfxstrand[d]: Not pack/unpack?
15:23karolherbst[d]: not the 64 bit ones
15:24gfxstrand[d]: I mean that pack/unpack may be easier for drivers than i2i64
15:24karolherbst[d]: but it probably wouldn't be hard to add I guess...
15:24karolherbst[d]: ahhh
15:24karolherbst[d]: yeah.. probably
15:24gfxstrand[d]: That's what we do for 32bit_offset_as_global
15:25karolherbst[d]: the IR doesn't know about 64 bit values at all
15:26karolherbst[d]: at least for CL I could lower enough away that 64 bit values won't ever hit the backend compiler (some of the 32 bit libclc builtins cast to 64 bit)
15:28karolherbst[d]: the hacky solution would be to simply load/store the lower 32 bit of addresses, but...
15:28karolherbst[d]: mhh
15:28karolherbst[d]: maybe I just need to add more opts at the problem
15:31karolherbst[d]: `u2u32(ishl(i2i64(x), 0x4))` yeah uhm...
15:31karolherbst[d]: why wasn't that optimized
15:39karolherbst[d]: mhhh.. but I also see `load_global` with a 64 bit address... I guess I'll need to fix some places
15:50karolherbst[d]: `Passed: 1/1 (100.0%)` :3
15:50karolherbst[d]: I wasn't casting to 32 bit inside `addr_to_global`
15:54karolherbst[d]: now I have those `u2u32(load_uniform(...))` things...
15:54karolherbst[d]: gfxstrand[d]: do we have a solution for that or.... did nobody bother so far to reduce the width of the load?
15:55karolherbst[d]: or should drivers simply split those loads themselves?
15:56karolherbst[d]: but in either case, I don't think adding 64 bit values to the shader is really doing us a favor here...
16:04gfxstrand[d]: karolherbst[d]: Hrm... I'm not sure we have an optimization for that.
16:05gfxstrand[d]: lower_mem_access_bit_sizes can turn the 64-bit into a vec2 and I think we'll optimize away the resulting u2u32(pack(load))
16:05gfxstrand[d]: But IDK about reducing the load itself
16:09gfxstrand[d]: I think there is something to reduce loads. I just don't remember at the moment what it's called.
16:13karolherbst[d]: gfxstrand[d]: right.. but I don't want to bring `load_uniform` into `lower_mem_access_bit_sizes`, because that's just pain on it's own
16:14karolherbst[d]: like.. is it in bytes? or vec4? or 4 bytes steps or how does the addressing work?
16:14karolherbst[d]: or maybe somebody should really go through the pain of making `load_uniform` be always in bytes...
16:14karolherbst[d]: and fix up all the drivers
16:16gfxstrand[d]: Right...load_uniform is a pain
16:42karolherbst[d]: maybe I should just handle `global_2x32` inside `lower_mem_access_bit_sizes` for now... but that's kinda uhm.. annoying knowing the only user doesn't care about anything happening in the upper bits...
17:58karolherbst[d]: I have an idea what I'll do for now :ferrisUpsideDown:
19:56karolherbst[d]: anyway, this is what I ended up doing: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29711
20:46gfxstrand[d]: I may have found some of our Zink instability: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29737/diffs?commit_id=69e0033421217180a79cd3be53cddb508d2e92f7
20:47gfxstrand[d]: I still think we have kernel bugs but that one's mine. 😩
21:10redsheep[d]: gfxstrand[d]: What kind of issues were you seeing that prompted this? Wondering what I should retest
21:12redsheep[d]: If it's only on destroy that doesn't really sound like the Wayland or sync issues, though maybe the chromium bug is due to it destroying and recreating buffers it uses?
21:47redsheep[d]: gfxstrand[d]: probably best not to merge that. I just built that branch to try and run my session and it broke sddm. Turned into a kind of bizarre light show. I got logged in and the x11 session is blown up too, all the UI elements smashed in a corner, with one screen strobing
21:58gfxstrand[d]: Really? That's surprising
21:59redsheep[d]: I put my session back to nvc0 and that still works fine, but trying to invoke zink for games also breaks
21:59gfxstrand[d]: Oh boy...
22:00gfxstrand[d]: And you're sure it's that and not the UBO changes?
22:00redsheep[d]: Is this branch based on that? I thought that had not merged yet
22:00redsheep[d]: I built the branch, not just tacking on the patch
22:01gfxstrand[d]: Yeah, I merged a bunch of UBO stuff yesterday
22:01redsheep[d]: Oh. Let me rebuild main and check
22:01gfxstrand[d]: Nothing that should regress perf but a bunch of it
22:05redsheep[d]: Alright I have main building now
22:08gfxstrand[d]: Either outcome scares me but I think UBOs breaking things is less likely to make me doubt my sanity. 😅
22:09redsheep[d]: Yeah I mean I had used the session just fine with that much heavier branch and it was fine on the UBO stuff
22:09gfxstrand[d]: Hrm...
22:09gfxstrand[d]: I mean, if something's going to run into import/export bugs, it's going to be Zink.
22:10redsheep[d]: I hope AMD comes out with a 24 core, these builds are too slow 😛
22:12redsheep[d]: Oh boy. Main is broken.
22:12gfxstrand[d]: Okay
22:12gfxstrand[d]: Let me throw some piglit at it and see if that triggers anything
22:17redsheep[d]: I see this also merged: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29715
22:17redsheep[d]: So it's not only nvk that has been touched
22:17gfxstrand[d]: Oh, yeah. I'd try reverting that
22:17redsheep[d]: Let me see if I can isolate to zink or nvk with an icd
22:18gfxstrand[d]: I could totally believe that UBOs hosed Zink, though. It tends to be a bit special there
22:20redsheep[d]: Yeah looks like it's in nvk. Pointing at the icd for the cb0-rebase branch I left hanging around makes minecraft load fine with current mesa main zink
22:21zmike[d]: smh implying zink has bugs in the current month of the current year
22:22redsheep[d]: Wait I thought the propaganda was nvk has no bugs, this is so confusing
22:23zmike[d]: obviously that's just propaganda
22:31gfxstrand[d]: redsheep[d]: Wait, cb0-rebase is fine but main isn't?
22:31redsheep[d]: Yes
22:31gfxstrand[d]: Well, that is odd....
22:31redsheep[d]: Or at least, it was as of like 27 hours ago
22:32gfxstrand[d]: What about main with NVK_DEBUG=no_cbuf?
22:33redsheep[d]: Oh that works fine
22:33redsheep[d]: But, I didn't use no_cbuf yesterday...
22:34gfxstrand[d]: Yeah, so the rebase branch you had had `no_cbuf` implied
22:34gfxstrand[d]: Looks like I broke the cbuf stuff somehow. I'm not all that surprised but I thought I'd gotten all the corners.
22:34gfxstrand[d]: piglit is still running
22:35redsheep[d]: Maybe I will test my entire session with no_cbuf then so I can still check if that latest MR helps zink issues
22:35gfxstrand[d]: Yeah, feel free
22:36gfxstrand[d]: At least that should tell me if there's any other breakage
22:36gfxstrand[d]: I'll look at these piglits in another 10 min
22:37redsheep[d]: Ok as expected the no_cbuf zink session works fine
22:38redsheep[d]: Yay! You fixed the wayland session!
22:39redsheep[d]: 👏
22:39redsheep[d]: You can add fixes 11163
22:40redsheep[d]: (Or I will just close it when this merges)
22:40zmike[d]: what about #10477
22:41gfxstrand[d]: redsheep[d]: Wait, what fixes that?
22:42redsheep[d]: I would expect 29737 did, that or running no_cbuf did, but I doubt it
22:42redsheep[d]: No way to be sure unless I patched an older mesa with 29737
22:44redsheep[d]: The cb0-rebase didn't have it fixed and it looks like the current state of main is really similar to that aside from cbuf behavior and 29737
22:45redsheep[d]: zmike[d]: I don't have gnome so I can't be sure, but probably? There's very similar symptoms there
22:46asdqueerfromeu[d]: https://gitlab.freedesktop.org/gfxstrand/mesa/-/commits/drm-rs :ferris:
22:46redsheep[d]: asdqueerfromeu[d]: What about it
22:50redsheep[d]: Wait, what am I saying, I am not running 29737 right now
22:51redsheep[d]: I am just on main with no_cbuf
22:53redsheep[d]: gfxstrand[d]: I realize now I only tested cb0-rebase with wayland before your patch that messed with whether it behaves like no_cbuf, so it's not 29737
23:02redsheep[d]: Sorry for the mixup, I am testing 29737 now with no_cbuf and that has a working wayland session as well. It doesn't appear to have fixed chromium corruption which was the other issue I had hoped it would fix
23:04gfxstrand[d]: redsheep[d]: Okay, that makes sense. I would have been surprised if 29737 fixed anything that easily reproducible.
23:05redsheep[d]: Yeah it will take me a while longer to tell if 29737 or no_cbuf fixes flicker
23:17redsheep[d]: Oh, flicker is still there. Most of the time it remains subtle though. Not the most pressing issue
23:17redsheep[d]: It seems no_cbuf doesn't *quite* get the wayland session working well, getting some random issues where it looks like the session crashed, but I have no coredumps to look at...
23:18redsheep[d]: Whatever is happening is enough to eat my clipboard contents though
23:20redsheep[d]: Nothing in dmesg either. Hate getting stuff like this because I can't even get enough info to open a useful issue, especially where it's not the default config.
23:24redsheep[d]: Alright, yeah I can't find anything fixed by 29737 then. Glad I have a new workaround though.
23:38gfxstrand[d]: 091a945b57995a0184bf83085e2dc5b5e8fa619b is the first bad commit
23:38gfxstrand[d]: commit 091a945b57995a0184bf83085e2dc5b5e8fa619b
23:38gfxstrand[d]: Author: Faith Ekstrand <faith.ekstrand@collabora.com>
23:38gfxstrand[d]: Date: Wed May 15 15:32:21 2024 -0500
23:38gfxstrand[d]: nvk: Be much more conservative about rebinding cbufs
23:38gfxstrand[d]: Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
23:39redsheep[d]: You were able to replicate and bisect?
23:40redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1251320434377621575/kwinError.txt?ex=666e2677&is=666cd4f7&hm=1fd39dd1bc0cc8be661c9a3198f68cb7e2999749d0eea0d08ac37cb2eab864f4&
23:40redsheep[d]: While I am still thinking about it, here's the logging I did manage to get on the remaining wayland issue when running no_cbuf
23:40gfxstrand[d]: Yeah, it blows up piglit bad
23:48redsheep[d]: So much for your fun friday 😅
23:49zmike[d]: https://media1.tenor.com/images/54451401d52c0dd2fe9ee5752857d53c/tenor.gif
23:52redsheep[d]: Oh the reporter on 10477 answered, not seeing a change from no_cbuf so that sounds like something else. Now that I know what good stack traces look like I am beginning to understand the suffering when one without debug symbols comes up lol
23:54gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29737/ how contains cbuf fixes, too.
23:55redsheep[d]: Ok lemme rebuild
23:55gfxstrand[d]: Zink seems pretty happy with that, though piglit will take a while.
23:56gfxstrand[d]: I'm a little surprised how slowly piglit runs on my 36T CPU
23:56gfxstrand[d]: I kinda wonder if it isn't vkEnumeratePhysicalDevices bound
23:57gfxstrand[d]: Lots of opening the drm device isn't exactly ideal on nouveau
23:57redsheep[d]: Oh also, it kind of seems like I have exposed yet another bug now that I can use the wayland session. When I run discord through nvc0 on the zink session it has new and different intermittent graphical issues that don't occur when using nvc0 discord with an x11 zink session
23:58redsheep[d]: I have been running nvc0 discord as a workaround for the other visual issues
23:59karolherbst[d]: could be multithreading issues