06:52airlied[d]: karolherbst[d]: mohamexiety[d] you know the drill, this time I've done some testing on a laptop 🙂
10:00mohamexiety[d]: still no change 🙁
10:00mohamexiety[d]: did you test switching? since for me it boots at 60hz and i have to change it to 120hz
10:00mohamexiety[d]: so it might be that it does actually work unless you have to switch
10:01mohamexiety[d]: because like
10:01mohamexiety[d]: even changing to 4k30hz, which is perfectly within the older limits, doesnt succeed and gives me a signal lost
10:02mohamexiety[d]: karolherbst[d]: this has been a constant with each patch on switching
10:10karolherbst[d]: for me it just doesn't work at all, and the booting breaks randomly
10:13karolherbst[d]: airlied[d]: so with the latest version, when it works, that means I plug it in after booting, it's capped at 4K@60
10:13karolherbst[d]: and it doesn't shut down
10:40karolherbst[d]: So what are our plans in regards to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179 ?
10:40karolherbst[d]: (the fmul_rtz interpolation thing)
13:10zmike[d]: phomes_[d]: Typically I just capture the whole thing and then trim later. This seems to be a bit more reliable for producing usable traces
13:10zmike[d]: karolherbst[d]: Replay with `-m rebind` may work around this
13:11karolherbst[d]: yeah it didn't
13:12zmike[d]: If that download link on drive will stay up for a couple days I'll report it and get it tracked
13:16zmike[d]: Try capturing a full trace and test whether that replays, then try trimming that and verify it still replays
13:41karolherbst[d]: zmike[d]: well.. it was created with a gfxreconstruct version that was on a commit from an older version of a branch for an PR
13:41karolherbst[d]: so who knows if that was just breaking things
13:43zmike[d]: ah
13:43zmike[d]: yeah maybe
13:46phomes_[d]: Yeah that was with a MR version I used previously before wayland key support landed. The MR got updated before it landed
13:46karolherbst[d]: not saying there might be a legitimate bug, but it's also one of those cases where I would go "so you used some experimental branch and now you file an issue for that? I'm gonna close that one"
13:46karolherbst[d]: *there might not be
13:48phomes_[d]: I have tried today with top of the dev branch. But now the captures are all just green screens and artifacts on replay. Will try a full capture and trim to one frame after
14:05zmike[d]: has anyone seen this https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15337
14:32karolherbst[d]: it feels like something I would be hitting, but sadly I don't think I did, unless it's just some weirdo use-after-free situation
14:32karolherbst[d]: _but_ I think there is a way to force gtk-4 to use GL over vulkan
14:33karolherbst[d]: `GSK_RENDERER=gl`
14:34karolherbst[d]: I think
19:21airlied[d]: Could me memory corruption elsewhere, don't think I've seen fedora reports of it
20:31mhenning[d]: mohamexiety[d]: Is https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40473 something we should try to get in for 26.1.0? It looks ready to me other than the couple of comments I left. I'm not sure we should backport it to stable but I think backporting to 26.1 is still reasonable at this point
20:33mohamexiety[d]: yeah i did see your comments and applied them locally (aside from the commit order since there's no dependence between the 2) but was waiting in case faith would look at it again or such and find something else
20:34mohamexiety[d]: regarding backports we could backport the very first commit to stable (the offset one) as that solves the bug with zink without adding any new features i guess
20:35karolherbst[d]: ooff.. seem some compute shaders in control cause real bad occupancy
20:35karolherbst[d]: 48 -> 12 warps apparently
20:36karolherbst[d]: which... is a bit odd?
20:36karolherbst[d]: need to dig into this after the stats completed
20:36mhenning[d]: mohamexiety[d]: the first commit? I'm guessing you don't actually mean "zink: Assert if we try to use a dedicated allocation with offset > 0"?
20:37karolherbst[d]: I think my calculation is a bit broken 🙃
20:37mohamexiety[d]: yeah the first nvk one, the one that adds the plane offsets and stores them
20:38chikuwad[d]: when is 26.1 branchpoint btw?
20:38mhenning[d]: oh, "nvk: Calculate and stash the plane offset and alignment at create time" then?
20:38karolherbst[d]: `workgroup_size: 1, 1, 26` 🥲
20:38mhenning[d]: chikuwad[d]: a few weeks ago
20:38chikuwad[d]: ah rip, ok
20:39mohamexiety[d]: mhenning[d]: yeah, that one solves the zink/egl bug
20:39mhenning[d]: don't worry, there's always next release 🙂
20:39chikuwad[d]: I was gonna say I'd like to land atomic f16 before branchpoint but bleh
20:39chikuwad[d]: mhenning[d]: oh yeah absolutely
20:39chikuwad[d]: I'd have also worked on it more but I'm struggling a bit to get the GLES CTS running
20:40chikuwad[d]: or rather, I need to re-work my deqp script and have been procrastinating on that
20:42karolherbst[d]: wait...
20:42karolherbst[d]: so apparently warps/SM dropping that low is actually legit
20:43karolherbst[d]: got other shaders that go from 48 to 3
20:43karolherbst[d]: and the occupancy calculator from CUDA says: yep.. that's 3
20:44karolherbst[d]: workgroup_size: 1, 1, 26
20:44karolherbst[d]: shared_size: 29328
20:44karolherbst[d]: 🥲
20:45mhenning[d]: gfxstrand[d]: Were you planning to do another round of review on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40473 ? Just want to be clear if we're waiting on that or not
20:48mhenning[d]: chikuwad[d]: ah, yeah deqp can be a little annoying. although, for fixing that issue you can probably just run the one failing test which might require less scripting around it than a full run does
20:48chikuwad[d]: yeah I intend on running the one test
20:48chikuwad[d]: but I'd still like to re-do the script to make things easier
20:49chikuwad[d]: just turn it into a little tool I can use to run whatever version of deqp needed with whatever caselist I provide
20:49karolherbst[d]: I _really_ hope that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33914 is going to help with some stuff but I gotta verify that first
20:49chikuwad[d]: run-deqp gles31 failing.Test as opposed to pulling out a monstrosity of a command
20:50chikuwad[d]: my real issue is that there's no nice way to override the GL driver like there is VK_ICD_FILENAMES, I have to use LD_LIBRARY_PATH
20:51mhenning[d]: oh, yeah I just run the monstrosity of a command each time I run a single test
20:51mhenning[d]: chikuwad[d]: I like to use meson devenv for that
20:51chikuwad[d]: yes, I know meson devenv is a thing but I wanna make the script a bit more portable for etnaviv nonsense, if I ever decide to get back to it
20:51chikuwad[d]: 😅
20:52karolherbst[d]: there is `MESA_VK_DEVICE_SELECT`
20:52chikuwad[d]: the iMX 8MP is _so slow_ at compiling etnaviv it's genuinely easier and faster for me to use qemu-user on my PC and ferry the build over
20:52karolherbst[d]: ohhh...
20:52karolherbst[d]: ~~meson devenv into a network share~~
20:53chikuwad[d]: pain
20:53chikuwad[d]: chikuwad[d]: this is also why I haven't worked on it more
20:53chikuwad[d]: working on it is just such a pain in the ass
20:55mhenning[d]: oh well, you'll just need to work on nvk instead 😛
20:55chikuwad[d]: that's what I'm doing :3
20:56chikuwad[d]: I have limited time and I'd rather spend it staring at NVK code trying to make sense of it than dealing with the logistics of cross compiling and moving builds between devices
20:56gfxstrand[d]: mhenning[d]: Yeah, mohamexiety[d] just pinged me about it. I was at a Khronos F2F and then the beach for 2 weeks. Getting back to stuff today.
20:56chikuwad[d]: especially with the limited I/O my devboard has
20:58real2pac911: evrything you are hired to do is pain, technically it never works out well for performance, it's like you do not see results of the works where as codebase is so large for those ways, that it's very easy to misread code and falsely diagnose the issue. inherent problem of the paradigm that is accepted to be allowed for commodity systems by some eager powers who rule the world like illuminaties etc. That is why i published my alternative works , don
20:58real2pac911: in that sense for your programming methods i am not better than you are
21:00chikuwad[d]: I'm not even paid for any of this (yet) I'm just a hobbyist
21:03chikuwad[d]: anyway, I'm finally studying optimization and vectorization for the first time in an academic setting and that's been fun so fr
21:03chikuwad[d]: c:
21:12mangodev[d]: i'd love to live and breathe optimization, but the most i have right now is highly informal education
21:12chikuwad[d]: the course started last week for me, today was lecture #2
21:13mangodev[d]: there are some ideas here and there, but it's mostly disjointed and i want to put it *toward* something
21:13chikuwad[d]: and we're already talking about instruction pipelines and latencies
21:14chikuwad[d]: first assignment is also already out and we have to optimize a render function without multithreading it or using the GPU
21:14chikuwad[d]: so purely CPU-based optimization
21:14mangodev[d]: that sounds fun
21:15chikuwad[d]: it's funny being in a CS master's as a physics grad
21:16mangodev[d]: the math must be a lot easier
21:16chikuwad[d]: so far, it has been
21:16mangodev[d]: i mean
21:16mangodev[d]: the math seems to get harder the more functional of concepts you approach
21:16chikuwad[d]: I'm a bit worried about the advanced graphics course I'll have later in the degree, however
21:17chikuwad[d]: mangodev[d]: indeed, and I've studied the basics of quantum mechanics so I _should_ be fine
21:17chikuwad[d]: e-e
21:18mangodev[d]: i've been slowly chipping away at implementing an abridged minimal perfect hash function in zig, and the math in half of these papers are something else
21:19mangodev[d]: i haven't gotten far enough in formal education to know what half of it is even supposed to mean, so it just feels like a ton of magic incantations
21:19chikuwad[d]: [Pat](https://cdn.discordapp.com/emojis/1403463356249276506.webp?size=48&animated=true&name=Pat&lossless=true)
21:19chikuwad[d]: you'll get there
21:20mangodev[d]: integrals and very, very long polynomials seem like core parts of some areas of programming
21:20chikuwad[d]: indeed
21:20karolherbst[d]: okay.. with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33914 that one really bad case goes from 3 to 5 warps/SM 🙃
21:21chintranhui: You chose a wrong person to bully with, i am so much loved and i have informators and people who care about me, they signal me to survive, the thing is the soon as you keep orbiting around illuminati space , they are coming to kill you soon, but i am tremendously strong under this pressure, cause it had happened 43 years in a row to me like this, i am always many steps ahead.
21:23karolherbst[d]: phomes_[d]: this branch _could_ help a few games, but not sure if it's any of the ones you are tracking... https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/MR_33914?ref_type=heads
21:25karolherbst[d]: I sould check out metro exodus...
21:40karolherbst[d]: mhh.. stats are odd:
21:40karolherbst[d]: Totals from 2164 (0.18% of 1212873) affected shaders:
21:40karolherbst[d]: CodeSize: 52068832 -> 51905456 (-0.31%); split: -0.35%, +0.03%
21:40karolherbst[d]: Number of GPRs: 141323 -> 141355 (+0.02%); split: -0.01%, +0.03%
21:40karolherbst[d]: SLM Size: 6656 -> 6648 (-0.12%)
21:40karolherbst[d]: Shared Size: 6047900 -> 5913540 (-2.22%)
21:40karolherbst[d]: Static cycle count: 65736985 -> 65815297 (+0.12%); split: -0.06%, +0.18%
21:40karolherbst[d]: Spills to memory: 725 -> 721 (-0.55%)
21:40karolherbst[d]: Fills from memory: 725 -> 721 (-0.55%)
21:40karolherbst[d]: Spills to reg: 16576 -> 16577 (+0.01%)
21:40karolherbst[d]: Fills from reg: 11395 -> 11396 (+0.01%)
21:40karolherbst[d]: Max warps/SM: 69178 -> 69176 (-0.00%); split: +0.03%, -0.03%
21:41karolherbst[d]: ohh that MR also touches scratch memory, right...
21:49phomes_[d]: karolherbst[d]: it did not help the ones in my set
21:49karolherbst[d]: 😢
21:50karolherbst[d]: it also didn't help the compute stuff I was thinking about, but I think there needs to be something done before it can be helpful there...
21:51mohamexiety[d]: did you try some mega perf branch with all of your compute changes together? :thonk:
21:52mohamexiety[d]: maybe it's all dependent on each other
21:52karolherbst[d]: nah.. I mean it didn't lower shared memory even though it could in theory
21:52karolherbst[d]: but yes, it will depend on lowering shared memory usage 🙃
21:52esdrastarsis[d]: I think I found a kernel regression, this test fails with a device loss in a full cts run on kernel 7.0.x (both cachyos kernel and the vanilla kernel compiled and installed manually), but it works on 6.19.14-arch (TU117):
21:52esdrastarsis[d]: Test case 'dEQP-VK.api.buffer_view.access.dedicated_alloc.buffer_view_memory_test_partial_offset0_with_buffer_suballocated_image_dedicated_alloc_graphics'..
21:52esdrastarsis[d]: DeviceLost (vk.queueSubmit(queue, 1u, &submitInfo, *fence): VK_ERROR_DEVICE_LOST at vkCmdUtil.cpp:334)
21:52esdrastarsis[d]: DONE!
21:52esdrastarsis[d]: Test run totals:
21:52esdrastarsis[d]: Passed: 11315/12914 (87.6%)
21:52esdrastarsis[d]: Failed: 4/12914 (0.0%)
21:52esdrastarsis[d]: Not supported: 1591/12914 (12.3%)
21:52esdrastarsis[d]: Warnings: 4/12914 (0.0%)
21:52esdrastarsis[d]: Waived: 0/12914 (0.0%)
21:53karolherbst[d]: Max warps/SM: 16
21:53karolherbst[d]: Num GPRs: 96
21:53karolherbst[d]: Shared size: 22560
21:53karolherbst[d]: is one of them
21:53karolherbst[d]: shared mem I need to reach for 20 warps/SM: 20480
21:54karolherbst[d]: and the shader is like two loops accessing two variables independently
21:54marysaka[d]: esdrastarsis[d]: what do you have in the kernel logs?
21:54marysaka[d]: also what version of mesa and CTS
21:55esdrastarsis[d]: marysaka[d]: ```
21:55esdrastarsis[d]: [ 244.559627] nouveau 0000:29:00.0: gsp: mmu fault queued
21:55esdrastarsis[d]: [ 244.567137] nouveau 0000:29:00.0: gsp: rc engn:00000001 chid:15 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:0000003ffdd40000 fault_type:00000002
21:55esdrastarsis[d]: [ 244.567153] nouveau 0000:29:00.0: fifo:000000:000f:000f:[deqp-vk[1128]] errored - disabling channel
21:55esdrastarsis[d]: [ 244.567162] nouveau 0000:29:00.0: deqp-vk[1128]: channel 15 killed!
21:56esdrastarsis[d]: marysaka[d]: mesa-git and vulkan-cts-1.4.5.3-0-gb17cf9b3863c44aea6f5e37d654d729f56de12ed
21:58esdrastarsis[d]: I'll try to reproduce with mesa 26.0.5
21:59esdrastarsis[d]: yeah, it also happens on mesa 26.0.5
22:01airlied[d]: does running just that test reproduce it?
22:03esdrastarsis[d]: airlied[d]: No 🐸
22:04mohamexiety[d]: it doesnt seem to happen on ampere/ada/blackwell too otherwise it would have been caught :thonk:
22:04marysaka[d]: That looks oddly similar to the MMU issues we were seeing when enabling compression/large pages...
22:04marysaka[d]: I think openrm have some workaround on page table transitions for Turing specifically, maybe it could be related?
22:11marysaka[d]: esdrastarsis[d]: run the test before the one that devicelost too just in case
22:15phomes_[d]: I hope that you are not annoyed by weird XCOM 2 observations... Look at the durations of draw calls in depth pass 1. For nvk the duration increases for each draw call. On prop it does not
22:15phomes_[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1498809858193887413/image.png?ex=69f282f4&is=69f13174&hm=55abc30dc734f198b04b9d1348cb9882d6901483c728fca887793f1d3031bc10&
22:19marysaka[d]: marysaka[d]: Seems to not reproduce on Blackwell, I could pull some TTU11X based GPU on the testbench next week to see if I can reproduce it
22:20marysaka[d]: phomes_[d]: that's odd especially if we only have some descriptors binding changes and then draws one after the other :blobcatnotlikethis:
22:20esdrastarsis[d]: marysaka[d]: It's working with `deqp-vk -n "dEQP-VK.api.buffer_view.access.*"`
22:21esdrastarsis[d]: Why turing is so "special"? 🐸
22:21esdrastarsis[d]: there's a lot of arch changes between turing and ampere
22:21marysaka[d]: because I remember seeing weird workaround on MMU migration for large pages to small pages at one point on openrm 😅
22:22_lyude[d]: fun nouveau fact: reading the scanoutline position in nouveau is and has been broken for a long time
22:22marysaka[d]: esdrastarsis[d]: are you running with WSI? like is that poping some window around when running the CTS?
22:22marysaka[d]: and if so, is the display on that GPU or another GPU
22:23mhenning[d]: esdrastarsis[d]: they were in the middle of changing everything for turing and it took another generation for the dust to settle
22:23_lyude[d]: volta is even more special!
22:23esdrastarsis[d]: marysaka[d]: I don't see any window poping
22:24esdrastarsis[d]: btw, the test before the problematic test fails too:
22:24esdrastarsis[d]: Test case 'dEQP-VK.api.buffer_view.access.suballocation.buffer_view_memory_test_partial_offset1_compute'..
22:24esdrastarsis[d]: Pass (BufferView test)
22:24esdrastarsis[d]: Test case 'dEQP-VK.api.buffer_view.access.dedicated_alloc.buffer_view_memory_test_complete_with_buffer_suballocated_image_dedicated_alloc_graphics'..
22:24esdrastarsis[d]: Fail (BufferView test failed. expected: 0 actual: 256)
22:24esdrastarsis[d]: Test case 'dEQP-VK.api.buffer_view.access.dedicated_alloc.buffer_view_memory_test_partial_offset0_with_buffer_suballocated_image_dedicated_alloc_graphics'..
22:24esdrastarsis[d]: DeviceLost (vk.queueSubmit(queue, 1u, &submitInfo, *fence): VK_ERROR_DEVICE_LOST at vkCmdUtil.cpp:334)
22:24esdrastarsis[d]: DONE!
22:24marysaka[d]: yeah it's the one faulting I suppose
22:24mhenning[d]: esdrastarsis[d]: Could you file a bug report for that, so we can keep track of the issue?
22:25esdrastarsis[d]: mhenning[d]: Where? I can do that when I have time
22:25marysaka[d]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items
22:25marysaka[d]: label as NVK
22:26karolherbst[d]: okay.. so the alloc pass has the problem that it only splits if there is a barrier...
22:26karolherbst[d]: so if there isn't a barrier between the last use of a variable, and the first use of another variable, it won't alias the allocation
22:26karolherbst[d]: I think
22:27karolherbst[d]: which kinda makes sense, but it might be better to simply insert more barriers instead
22:35_lyude[d]: Ah. of course. it's broken because no one ever wrote the code for reading the scanline on gsp
22:44_lyude[d]: airlied[d]: am I right in assuming that actually reading back the horizontal scanline position is not actually really a thing these days on modern video hardware?
22:46airlied[d]: Not sure anything really needs it though I think some of the vblank accuracy might like it
22:46_lyude[d]: gotcha - yeah, I don't actually even see a gsp command for reading in the hline
22:53_lyude[d]: airlied[d]: totally just something I happened to spot when looking through this code:
22:53_lyude[d]: #define NV0073_CTRL_SYSTEM_CAPS_HDMI21_SW_ACR_BUG_3275257 1:0x40
22:54_lyude[d]: in openrm I mean
23:16_lyude[d]: heheheehhhhhhh,... hooray, the only difficult part of implementing this apparently is entirely unchanged with gsp vs. acr
23:18_lyude[d]: in fact I'm not even actually sure if it's worth doing this through the GSP at all. Using gv100_head_state gv100_head_rgpos in r535_head seems to work perfectly
23:56airlied[d]: the latency of going via gsp would be too long