04:10fdobridge: <gfxstrand> Ugh... Something is wrong with the shared synchronization test
04:11fdobridge: <gfxstrand> They're faulting but only some of the time, meaning we're either not filling out descriptor sets properly (seems unlikely) or something is failing to bind. 😕
04:11fdobridge: <airlied> gimme some test names?
04:19fdobridge: <gfxstrand> `dEQP-VK.synchronization.signal_order.shared_binary_semaphore.*_opaque_fd`
04:20fdobridge: <gfxstrand> `dEQP-VK.synchronization.signal_order.shared_binary_semaphore.write_ssbo*opaque_fd` (edited)
04:21fdobridge: <gfxstrand> That'll crash within the first few.
04:21fdobridge: <gfxstrand> It's not always the first test but it'll crash quick
04:24fdobridge: <airlied> oh I get a gpu trap and warp errors and stuff
04:24fdobridge: <gfxstrand> Yup
04:25fdobridge: <airlied> fifo: fault 00 [VIRT_READ] at 0000007ffffb1000 engine 40 [gr] client 00 [GPC1/T1_0] reason 02 [PTE] on channel 5 [01ff5b3000 deqp-vk]
04:25fdobridge: <airlied> will take a closer look when I get a minute
04:25fdobridge: <airlied> nearly finished a piglit single thread run with zink
04:32fdobridge: <gfxstrand> The weird thing is... I don't know where 0x7ffffb1000 is coming from
04:35fdobridge: <gfxstrand> It's not an SSBO binding or a vertex buffer or a shader
04:38fdobridge: <airlied> I see it the vm debug trace
04:39fdobridge: <airlied> Maybe should add some names :-p
04:46fdobridge: <gfxstrand> Hrm... It's the descriptor pool
04:46fdobridge: <gfxstrand> Why would it be faulting trying to read the descriptor set
04:47fdobridge: <gfxstrand> That seems very odd indeed
04:48fdobridge: <gfxstrand> Someone's not waiting for the GPU to be idle I think
04:52fdobridge: <airlied> Yeah sounds like the object is freed before finished with
04:53fdobridge: <gfxstrand> Yeah, except either GDB is lying to me or it's idling both devices before freeing anything
04:58fdobridge: <gfxstrand> Actually...
04:58fdobridge: <gfxstrand> These tests are bogus
04:58fdobridge: <gfxstrand> *sigh*
04:59fdobridge: <gfxstrand> maybe not
05:01fdobridge: <gfxstrand> Yeah, there's a test bug. *sigh*
05:01fdobridge: <gfxstrand> But the test bug is just what's causing the fault
05:02fdobridge: <gfxstrand> cross-queue sync is broken somehow and that's causing the fails
05:08fdobridge: <gfxstrand> Here's a CTS patch:
05:08fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1137975652142821446/message.txt
05:09fdobridge: <gfxstrand> I'll submit it tomorrow
05:09fdobridge: <gfxstrand> That gets rid of the faults but we still have fails
05:32fdobridge: <gfxstrand> Given that we WFI for every pipeline barrier, I suspect something's wrong with cross-device synchronization.
05:38fdobridge: <gfxstrand> I'm fact I'm sure there's something wrong with synchronization because the tests do wait on the receiving device and, if sync were working properly, waiting on the receiving device would implicitly wait on the sending device and we wouldn't be faulting.
05:52fdobridge: <airlied> [28376/28376] skip: 1867, pass: 25530, warn: 11, fail: 373, crash: 594 /
05:58fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How much tests does synchronization2 add? :frog_gears:
06:02fdobridge: <airlied> okay one low hanging fix from the piglit run in an mr
06:22fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> And it got merged before the mesamatrix change :gears:
06:33fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> `ERROR: unknown nir_intrinsic_op terminate` 🤔
08:16fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Why is vulkaninfo not recognizing the NVK ID?: `driverID = UNKNOWN_VkDriverId_value24`
08:19fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I forgot I had an outdated vulkaninfo version 🍩
08:28fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> 🏴
08:28fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1138025810389893130/Screenshot_20230807_112738.png
08:41fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder if the missing graphics are because of missing EXT_attachment_feedback_loop_layout extension 🤔
09:20fdobridge: <airlied> possibly those memory model warnings are meaningful
09:45fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Are nir_intrinsic_discard and nir_intrinsic_terminate different? 🐸
13:16fdobridge: <gfxstrand> Yes
13:17fdobridge: <gfxstrand> This patch got rid of most of my weird spurrious fails that seemed to be caused by tests faulting at the same time destroying nearby contexts.
13:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> So treating them the same in codegen is wrong, right? :triangle_nvk:
13:25fdobridge: <gfxstrand> @airlied https://gerrit.khronos.org/c/vk-gl-cts/+/12450
13:27fdobridge: <gfxstrand> Well, discard is one of either terminate or demote. IDK which codegen does right now.
13:33fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Also gamescope is now failing in vkAllocateDescriptorSets for some reason 🤔
13:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Here we go again: `Pool size limit reached (offset = 512, bo_size = 216, pool_size = 648)`
15:01fdobridge: <gfxstrand> I don't trust our descriptor pools
15:04fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Can they be fixed?
15:04fdobridge: <gfxstrand> I'm sure they can
15:08fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> It's weird how gamescope didn't fail back in July 23rd 🤔
15:08fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> In that vkAllocateDescriptorSets call (I even tried downgrading gamescope with no success)
15:18fdobridge: <gfxstrand> Descriptor sets have changed some in the not-too-distant past
16:08fdobridge: <gfxstrand> July 23 seems pretty recent, though.
16:15fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> :triangle_nvk: moment 🤔
16:15fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1138143396708495360/message.txt
16:17fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> But anyway commit e311b24b793222f5fc458806104ce30028493ad5 is the problematic one
16:18fdobridge: <dadschoorse> the only thing special about gamescope's descriptor sets is an array of ycbcr samplers, otherwise it's just vulkan 101 stuff
16:27fdobridge: <gfxstrand> Weird... there's only one hunk in that commit that should make any difference at all and it should only affect YCbCr
16:29fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> gamescope uses YCbCr stuff so it's obviously affected by this change
16:31fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> So we have to reduce the BO size somehow or just increase the pool size 🎱
16:31fdobridge: <gfxstrand> Oh, I bet I know the problem, then. I don't think we took YCbCr into account in `GetDescriptorSetLayoutSupport()` or `CreateDescriptorPool()`
16:31fdobridge: <gfxstrand> @mohamexiety ^^
16:32fdobridge: <gfxstrand> I don't remember what the rules around that are.
16:33fdobridge: <mohamexiety> I'll take a look at it in a bit, sorry about that. I initially looked at it while doing this and thought that it would've been handled
16:33fdobridge: <gfxstrand> For `GetDescriptorLayoutSupport()`, it takes a full `VkDescriptorSetLayoutCreateInfo` so we just fish the details out the same way as descriptor set create.
16:33fdobridge: <gfxstrand> I don't remember how pools are supposed to handle it.
16:33fdobridge: <mohamexiety> yeah
16:34fdobridge: <gfxstrand> Hrm... No, pools should be taken care of
16:34fdobridge: <gfxstrand> Those are handled by `VkSamplerYcbcrConversionImageFormatProperties::combinedImageSamplerDescriptorCount` which I think we properly set
16:34fdobridge: <mohamexiety> yep, that's what I initially thought
16:35fdobridge: <mohamexiety> so I guess the fault lies in `GetDescriptorLayoutSupport()`
16:35fdobridge: <mohamexiety> I didn't touch that one at all
16:35fdobridge: <dadschoorse> oh, gamescope doesn't use VkSamplerYcbcrConversionImageFormatProperties::combinedImageSamplerDescriptorCount
16:35fdobridge: <gfxstrand> It should!
16:36fdobridge: <mohamexiety> time to file a gamescope patch I guess 🐸
16:36fdobridge: <gfxstrand> Yeah, that just needs the same `max_plane_count` stuff.
16:36fdobridge: <dadschoorse> yeah I didn't know that this even exists and it's not a problem on radv
16:37fdobridge: <mohamexiety> got it, will take care of that then. sorry!
16:37fdobridge: <gfxstrand> No worries. I missed it too. 🙃
16:37fdobridge: <gfxstrand> It all depends on how tight your pool allocation ends up being. 😕
16:37fdobridge: <mohamexiety> RADV does set that bit correctly though. I sanity checked my changes with both RADV and ANV
16:38fdobridge: <dadschoorse> radv's image descriptors always have space for two planes because msaa has two planes with compression
16:39fdobridge: <dadschoorse> but I'm surprised that this didn't cause validation errors
16:42fdobridge: <gfxstrand> There isn't really a way to validate that, unfortunately.
16:44fdobridge: <dadschoorse> is it valid to include VkSamplerYcbcrConversionImageFormatProperties for all format queries, or is it only allowed for formats that can be used with ycbcr conversion?
16:54fdobridge: <dadschoorse> @asdqueerfromeu can you try <https://github.com/DadSchoorse/gamescope/tree/ycbcr-descriptor-count>?
17:08fdobridge: <gfxstrand> I don't see anything that implies it's illegal to include in a non-YCbCr format query. It should return 1 in that case.
20:12fdobridge: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24532 anyone want to rb that?
20:13fdobridge: <mohamexiety> hm. interesting that NV reports 4000
20:13fdobridge: <airlied> yeah not sure if that's some hedge against planes or somethign
20:14fdobridge: <airlied> but yeah if we change to match them we should do it in both plces
20:14fdobridge: <airlied> places
20:29fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> `[250162.551376] nouveau 0000:01:00.0: gr: TRAP ch 3 [00ff9cc000 gamescope]` :triangle_nvk:🪤
20:38fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> :nope_gears:
20:38fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1138209578752933898/Screenshot_20230807_233710.png
23:30fdobridge: <gfxstrand> Woo! Found a kernel bug by reading the code. 🙃
23:39fdobridge: <airlied> @gfxstrand nice, is the right bug? 🙂
23:40fdobridge: <gfxstrand> I don't know
23:40fdobridge: <gfxstrand> I doubt it
23:43fdobridge: <gfxstrand> https://patchwork.freedesktop.org/patch/551659/
23:53fdobridge: <gfxstrand> Yeah, didn't fix anything
23:53fdobridge: <gfxstrand> I didn't figure