02:05karolherbst[d]: mhh anyway.. that mov stuff is fallout from par_copy being added like this:
02:05karolherbst[d]: r76..78 = ldsm.16..m8n8.trans.x2 [r59+0x2800]
02:05karolherbst[d]: par_copy r78 = r55, r79 = r56
02:05karolherbst[d]: r78..80 = hmma.m16n8k16.f16 r60..64 r76..78 r78..80
02:05karolherbst[d]: par_copy r0 = r47, r1 = r48, r47 = r0, r48 = r1
02:05karolherbst[d]: r0..2 = hmma.m16n8k16.f16 r64..68 r76..78 r0..2
02:05karolherbst[d]: par_copy r0 = r39, r1 = r40, r39 = r0, r40 = r1
02:05karolherbst[d]: r0..2 = hmma.m16n8k16.f16 r68..72 r76..78 r0..2
02:05karolherbst[d]: par_copy r0 = r31, r1 = r32, r31 = r0, r32 = r1
02:05karolherbst[d]: r0..2 = hmma.m16n8k16.f16 r72..76 r76..78 r0..2
02:06karolherbst[d]: and I have no idea why it's doing this ๐
02:40mhenning[d]: I think it's trying to legalize the vector
02:41mhenning[d]: which is to say that eg. 31, 32 is unaligned so it's moving it to 0, 1 so it can use it as an aligned reg pair
04:51cantrememberthose: on may 17 i was so horribly stoned, and tried to talk with the channel using my pocket calculator on phone. I see , I did not succeed in this rally. but the idea was actually correct, i just only made mistakes. so first correct calculations are 224 based index we get reference point as 224, where 232 we get 232 240 we get 240 and 248 we get 248. So this was correct that they have to
04:51cantrememberthose: be aligned properly however. So the calculations look like this. 224−60−60−60−60−32−128−240+128+64 as the forth index 60 aka 124-8+30=146 and 224−64−64−64−64−32−128−240+128+64=-240 as third aka 120-16+32=136. Other than that everything applies the same, but may 17 i was just writing something i could not grasp afterwards, i remember i was smoking strong cannabis :D.
05:22humblejunkie: Rest of it seems correct to me, and until the index is smaller than the whatever reference point chosen per bank of answer sets, you can derive the intermediate values, and it is always smaller, other than that one day where i made only mistake after mistake, i had been nearly correct or no big mistakes spotted now. The thing is my pal for years has been telling to me, that canabis is very
05:22humblejunkie: rearly not good to certain types of people, but there are many of the different ones, some indeed offput me for fact these days the least, i am much sharper being sober so to speak.
05:46marodontsik: Mostly the thing i was trying to say was, that at one end you have to compensate with the alforithm i dropped, you'll have to adjust the inverse value from constant and it's subtract from bigger to a certain distance, but correction is not needed when you manipulate the index instead during runtime, based of higher reference point, this is the find that i was trying to upload, that my body
05:46marodontsik: is very fast adjusting and reactive is known to me. I have smoked cigarettes for years, and when i skip couple of hours doing one, i get head spins, or say 6hours i do not smoke a ciggie, i get a feeling like i never did any, like i was smoking the first one in my life. :D
08:31karolherbst[d]: mhenning[d]: oh, that might be it
08:47karolherbst[d]: mhenning[d]: okay.. yeah I see what's happening now. The phi nodes aren't getting aligned by RA: `div 16x4 %142 = phi b0: %126 (0.000000, 0.000000, 0.000000, 0.000000), b12: %266` is the one of those on the NIR side
08:49karolherbst[d]: so I guess if we skip splitting up phis, need to make RA smarter to also align those properly
09:17karolherbst[d]: doesn't help that this node is also used in two hfma2s
10:30karolherbst[d]: mhhhh
10:31karolherbst[d]: those vec2 values still go through the `.try_find_unused_reg_range(0, 1, 1, 0)` code...
10:43snowycoder[d]: Kepler should have BAR memory heap, right?
10:43snowycoder[d]: for now NVK only advertises BAR heap type from Maxwell onwards.
11:58themaister[d]: Does NVK take a performance hit for 2D_ARRAY vs 2D image descriptors?
11:58themaister[d]: Ran into a case where texture2DArray in shader cannot properly read from a 2D view, but a 2D_ARRAY view with 1 layer works in all cases.
11:59themaister[d]: (D3D requires this to work)
12:46karolherbst[d]: it's another coordinate in the shader, which has some RA implications
12:48gfxstrand[d]: themaister[d]: Nope. They're literally the same, IIRC. Just use arrays everywhere if you want. The only potential cost is a register.
12:49themaister[d]: ok, because they're not the same on proprietary apparently
12:49themaister[d]: querying dimensions for 2D array returns 0 if the view is 2D
12:50themaister[d]: maybe they just skip writing that to the descriptor or something
12:50themaister[d]: and there were some robustness issues as well ...
12:50themaister[d]: I don't want to emit texture2darray in every shader fwiw
12:50themaister[d]: just the descriptor
13:01snowycoder[d]: themaister[d]: Isn't this a bug?
13:11themaister[d]: Vulkan requires that types match, so accessing a 2D descriptor as a 2DArray yield undefined results
13:11themaister[d]: we should fix that in some maint extension
13:11karolherbst[d]: ~~I know a weird trick~~
13:11themaister[d]: layer = max(layers, min(width, 1))
13:11karolherbst[d]: ~~or rather, it's Mike that knows it~~
13:12themaister[d]: but just using 2D array view everywhere on NV is a easier fix
13:12themaister[d]: and less bs heroics in vkd3d-proton
13:12karolherbst[d]: so for d3d you need to do 2DArray texture ops on a 2D image?
13:12themaister[d]: d3d allows that yes
13:12themaister[d]: and the opposite
13:12karolherbst[d]: yeah soo there is a spec compliant way to do that
13:12themaister[d]: you can emit 2darray in all shaders everywhere on Vulkan, yes
13:12karolherbst[d]: you dma-buf export the 2Darray and import it as 2D and vice versa
13:13themaister[d]: no, this is a view
13:13themaister[d]: not image
13:13themaister[d]: images are 2D always in Vulkan
13:13themaister[d]: the view is 2D or arrayed
13:13karolherbst[d]: ohh, right
13:13themaister[d]: and emitting texture2darray in every shader is an unacceptable solution
13:13karolherbst[d]: mhh
13:13karolherbst[d]: yeah okay, that's more annoying
13:14themaister[d]: wrote tests for all of this today at least
13:14karolherbst[d]: long-term probably want an extension for it or so...
13:15themaister[d]: it's on the maintenance tracker
13:15karolherbst[d]: but yeah.. the array length is part of the texture header
13:16themaister[d]: and if 2D view sets that to 0
13:16themaister[d]: it all makes sense
13:16karolherbst[d]: yeah, it probably does
13:16themaister[d]: assuming QuerySize just returns what's in the descriptor as-is
13:16karolherbst[d]: it's a native instruction
13:17themaister[d]: sure
13:17themaister[d]: I assume HW just returns what's in the descriptor payload anyway
13:17karolherbst[d]: mhhh
13:17karolherbst[d]: sooo
13:17karolherbst[d]: not really
13:18karolherbst[d]: for 2D images it either returns 0 or 1
13:18karolherbst[d]: not what's in the header
13:18themaister[d]: aanyway
13:19karolherbst[d]: ehhh..
13:19karolherbst[d]: I think for MSAA images it might return 1 and not 0
13:20karolherbst[d]: might want to double check that with nvidia
13:20karolherbst[d]: query on a 2D view that is
13:21karolherbst[d]: for non mipmapped images it might even return something else, but not sure you'll have to care about that
13:21karolherbst[d]: non mimpapped doesn't mean 0 levels tho
13:21karolherbst[d]: (or 1 depending on how you count)
13:22karolherbst[d]: hope that helps ๐
15:15themaister[d]: I'm talking about layers, not levels
15:22karolherbst[d]: I know
15:23karolherbst[d]: just saying that for MSAA images the behavior will be different (probably)
15:24karolherbst[d]: and that it depends on if the image is mipmapped or not
15:24karolherbst[d]: *that = the value
15:26karolherbst[d]: it's only special for 2D image views
15:45gfxstrand[d]: themaister[d]: Okay, I lied. They're different.
15:45gfxstrand[d]: I don't know how/why they're different
15:45gfxstrand[d]: But yeah they're different texture types
15:46gfxstrand[d]: ๐ข
16:08tiredchiku[d]: https://tenor.com/view/anakin-liar-star-wars-lying-gif-8634649
16:09karolherbst[d]: gfxstrand[d]: yeah.... though what matters is if TXQ is also different, but given it doesn't encode the shader texture type, but only looks what the header has....
16:25nativederivative: Now i tested a final thing. The reference point of the whole set when placeholder not known has some rules. where indexes are the closest. so hence 224−60−60−60-72−120−220=-248 cause 60 and 62 are closest together, and value can be extracted like this: 256-224=32+130+130=292 so 292-256=36 cache it for later, 256+256+256 (so three duplicates) -292=476-32-256-256=-68+36=32 so
16:25nativederivative: 32+224=256 , where 130 comes from 248-183-183=-118 248-118=130 183 comes from 256-73, 73 comes from 124-8+30/2 and 60 is decrementing index so it happens to be 12th element in gaps of 4 it's third. This appears to be the access reference point of the set on base 256 and 512 for computation sake. So that boils down to two data methods and two compute ones being available.
16:26themaister[d]: gfxstrand[d]: https://github.com/HansKristian-Work/vkd3d-proton/pull/2480
16:27themaister[d]: there's a test case for it as well
16:44gfxstrand[d]: Cool
16:45gfxstrand[d]: We could try just doing that in the driver. IDK if anything would break.
16:55themaister[d]: the workaround applies to NV vendorID, not just driverID fwiw
16:56themaister[d]: doubt it'll break NVK
17:22gfxstrand[d]: mohamexiety[d]: Did you ever make any progress on those sparse fails?
17:22mohamexiety[d]: gfxstrand[d]: not yet, managed to repro though and was looking into them now actually
17:23gfxstrand[d]: Cool
17:23gfxstrand[d]: I'll leave you to it, then.
17:23mohamexiety[d]: test output is just black/nothing
18:42snowycoder[d]: Kepler suldga/sustga seem to ignore membars :3
18:42snowycoder[d]: Weird arch
18:44mohamexiety[d]: I mean it has sus in the name
19:12gfxstrand[d]: snowycoder[d]: Oh?
19:13gfxstrand[d]: How did you come to that conclusion?
19:25snowycoder[d]: gfxstrand[d]: ```
19:25snowycoder[d]: dEQP-VK.memory_model.message_passing.core11.u32.coherent.control_and_memory_barrier.atomicwrite.subgroup.payload_local.buffer.guard_nonlocal.buffer.comp,Pass,6.241178
19:25snowycoder[d]: VK.memory_model.message_passing.core11.u32.coherent.control_and_memory_barrier.atomicwrite.subgroup.payload_local.image.guard_nonlocal.buffer.comp,Fail,6.013088
19:25snowycoder[d]: And all the other `memory_model.message_passing` tests follow the same pattern (when there's an imageLoad/imageStore it fails)
19:25snowycoder[d]: I need to check what cuda uses for image storage barriers
19:25gfxstrand[d]: We should be able to pass the core11 tests somehow
19:26snowycoder[d]: Yes it's one of the last family of tests to fail
19:26gfxstrand[d]: We can't pass the whole memory model but the core11 tests are specifically the ones Kepler can doe.
21:00gfxstrand[d]: snowycoder[d]: do 3D images work? I'm seeing what look like a lot of bugs around 3D storage images.
21:43mohamexiety[d]: gfxstrand[d]: what does reinterpreting an image mean in the context of these tests btw? I am a bit confused
21:46gfxstrand[d]: I'm not 100% sure but I think it means using a different format or maybe aliasing with an image with a different format.
21:47mohamexiety[d]: hm
21:48gfxstrand[d]: snowycoder[d]: Okay, after my 2nd read through of the lowering code, I think I finally understand how it all works. Hopefully this means I can also review the NIL code.
21:50snowycoder[d]: gfxstrand[d]: huh, what bugs?
21:50snowycoder[d]: I think the only thing missing is 3D sliced and I'm adding it now
21:50gfxstrand[d]: Okay then maybe I'm the one who's confused. :silvy_sweat:
21:51snowycoder[d]: mohamexiety[d]: In the image storage tests: using a view with a different format than the image (but with the same block width)
21:52snowycoder[d]: gfxstrand[d]: My method was throwing educated guesses at tests and seeing what worked, so there surely are bugs I'm not aware of๐
21:56gfxstrand[d]: This is the one that has me really concerned: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34975#note_2922130
21:56mohamexiety[d]: snowycoder[d]: ah.. hm
22:08snowycoder[d]: gfxstrand[d]: Ok, maybe I know why:
22:08snowycoder[d]: - the lower 16 bits represent the tile coordinates
22:08snowycoder[d]: - Z only affects which block we choose
22:08snowycoder[d]: - The tests only have multiple tiles in either the Z or the Y dir?
22:08snowycoder[d]: Ugh, I'll look at this tomorrow
22:09snowycoder[d]: For why we don't use just an add: imadsp also extracts the value from the bitfield, with an add we should also `iand(x, 0xffff)` both operands
22:11snowycoder[d]: (Also, I left the descriptor layout as was in codegen, that's why I didn't remove fields)
22:13gfxstrand[d]: Yeah, I suspect it's just that the CTS never uses an image big enough to be a problem and codegen was just never tested well enough.
22:14gfxstrand[d]: One thing you could try is hacking up nil/tiling.rs to force the tile size to the minimum
22:15snowycoder[d]: that shouldn't break too many things I hope
22:15snowycoder[d]: I'll try tomorrow
22:48mohamexiety[d]: snowycoder[d]: yeah it should be fine in that regard, I had to do it before for some similar stuff