00:05 airlied[d]: the new tic needs ZS/SZ/ZFS encodings
00:05 airlied[d]: I've pushed it to my nvk-wip-gb20x branch
00:05 airlied[d]: now to see if I can find a texture
00:05 gfxstrand[d]: blarg
00:06 airlied[d]: hmm I think texturing worked, let me go see what I screwed up 🙂
00:06 gfxstrand[d]: Okay, that's easy enough to add and translate to the old thing
00:07 gfxstrand[d]: Let me do a bit of typing
00:08 airlied[d]: okay looks like that makes basic texturing test pass
00:08 airlied[d]: dEQP-VK.glsl.texture_functions.texture.sampler2d_fixed_fragment
00:09 gfxstrand[d]: 🎉
00:27 gfxstrand[d]: Okay, yeah, this is going to take more thinking
00:27 gfxstrand[d]: Do we know what SGNRGB and SGNA are?
00:27 gfxstrand[d]: SRGB, maybe?
00:27 gfxstrand[d]: And what about DSDT?
00:27 airlied[d]: no idea sgn could be sign,
00:28 gfxstrand[d]: Also, there are now component sizes for SRGB
00:28 airlied[d]: I assume once I get to image tests it'll be a bit more mixing/matching
00:29 gfxstrand[d]: Okay, let's carry on with your extra column for now and we can figure out a proper mapping once we have more stuff passing and understand the hardware better.
00:38 airlied[d]: I think the texture srcs ordering probably changes, since I guess the handle has to be in a ur now
00:40 mhenning[d]: airlied[d]: I think there are SULD/SUST/SUATOM encoding changes that i haven't fixed yet
00:43 mhenning[d]: airlied[d]: My hunch was that the bindless instructions were the same except for maybe bias/clamp, but that's just a guess
00:44 mhenning[d]: We're not setting the UR anywhere yet, so it's encouraging if some of the texture tests pass
00:47 airlied[d]: I'm setting the UR in my local hacks
00:49 gfxstrand[d]: airlied[d]: Setting it to rZ?
00:51 airlied[d]: nope setting it to the register from the ldcu.texunpack
00:52 mhenning[d]: Oh, are you still encoding it as .B then?
00:52 airlied[d]: setting bit 91 to true
00:53 airlied[d]: https://paste.centos.org/view/raw/d34812df I've lots of local experiments, not sure what is actually correct yet
00:53 airlied[d]: I think probably need to add another arg to the nir texture instruction to do it properly
01:20 gfxstrand[d]: airlied[d]: Adding a new NIR backend source is easy enough.
01:24 airlied[d]: looks like offsets also go into a ur
01:24 airlied[d]: uaoffi bit
03:46 airlied[d]: hmm 4/5/6 bits are broken, and some zs busted
04:19 gfxstrand[d]: There's a part of me that wonders if we should separate texture and render format tables and then just have a separate table for Hopper+
04:19 gfxstrand[d]: Seems a bit scorched earth
04:20 gfxstrand[d]: But also, render has never 100% fit in that table. In a lot of ways, it's more akin to how we handle VB formats since it runs through a totally different HW unit.
06:21 airlied[d]: okay some bits of offset support pushed, adds a few more passes, but still doesn't get all the way
07:21 airlied[d]: gfxstrand[d]: okay pushed out where I got to today, dEQP-VK.glsl.texture_functions.texture.*fragment 8 fails left, all shadow,
07:21 airlied[d]: dEQP-VK.texture.filtering.2d.formats.* has 4/5/6 bit and ds fails
12:09 snowycoder[d]: Quick question, kepler suld needs surface info stored in cbufs (width, height, ...).
12:09 snowycoder[d]: What code should I edit to fill the cbufs?
14:13 gfxstrand[d]: snowycoder[d]: Wait, what?
14:13 gfxstrand[d]: It can't do bindless?
14:19 marysaka[d]: kepler not handling bindless kind of ring a bell...
14:38 gfxstrand[d]: Fermi doesn't have bindless at all
14:38 gfxstrand[d]: But I thought Kepler did
14:40 marysaka[d]: hmm okay no Kepler should have that
14:41 gfxstrand[d]: I'm not sure about Kepler A
14:43 gfxstrand[d]: According to https://vulkan.gpuinfo.org/displayreport.php?id=36868#properties_extensions, Kepler B can definitely do descriptor indexing (so bindless)
15:16 snowycoder[d]: gfxstrand[d]: It can but it seems to encode info on the pointer to then do clamp and lea manually for surfaces (textures work fine)
15:23 gfxstrand[d]: Oh, yeah, codegen emits piles of BS.
15:23 gfxstrand[d]: Most of it isn't actually needed by NVK
15:24 gfxstrand[d]: Just focus on the suld itself
15:34 gfxstrand[d]: But we do need to figure out the bindless form which I'm not sure codegen handles currently.
15:37 gfxstrand[d]: It does that to work around weirdness with 3D textures. We handle that in NVK by turning them into 2D arrays.
15:38 gfxstrand[d]: Also, go ahead and rebase. I landed my format patches last night.
16:05 karolherbst[d]: gfxstrand[d]: at least in vulkan you can do that sanely
16:14 gfxstrand[d]: Yeah, we know up-front if it's going to be used for storage
16:15 gfxstrand[d]: In GL, you would have either copy on first storage use or do heroics
16:16 gfxstrand[d]: I think the HW might have improved for 3D storage around Turing or so but I never bothered trying to make it work because 3D storage perf is a big meh.
16:16 gfxstrand[d]: Oh, right... We did have to figure that all out for sparse.
16:16 gfxstrand[d]: Because you can't do the 2D array trick for sparse.
16:16 gfxstrand[d]: But Kepler doesn't have sparse so we're fine there.
16:17 karolherbst[d]: heh
17:18 snowycoder[d]: gfxstrand[d]: Even nvcc for surfaces does a werid dance with suclamp, suclamp.sd, imadsp, subfm, sueau before using suld to load the texture.
17:18 snowycoder[d]: In codegen we use the low 9 bits to index into a table of bindless surfaces in cbuf-space (max: 511 textures) (see loadSuInfo32 / processSurfaceCoordsNVE4)
18:04 gfxstrand[d]: Right. So they might be doing the array offset manually or something.
18:04 gfxstrand[d]: We can do that if needed. It just needs to be done in NVK, not NAK.
18:13 karolherbst[d]: yeah.. manual address calculation is needed afaik
18:21 gfxstrand[d]: As long as it's only to the array slice, that shouldn't be horrible. We'll have to bloat the descriptor a bit but oh, well.
18:34 karolherbst[d]: at least it's not fermi, I think it was way worse there
18:37 gfxstrand[d]: We've already bloated the storage image descriptor to 8B. We can probably increase it to 16 without breaking anything.
18:37 gfxstrand[d]: snowycoder[d]: What are these suclamp and sueau instructions?
18:39 gfxstrand[d]: Oh, boy... Just found them in an older version of the CUDA docs
18:40 gfxstrand[d]: Okay, yeah, we might need to add some new NIR ops and some lowering.
18:40 snowycoder[d]: So, correct me if I'm wrong:
18:40 snowycoder[d]: old codegen seems to map TXC (TIC and TSC) directly to CBUFS, indirect textures handles passed to shareds are stored inside those (that's why codegen has an additional indirect load).
18:40 snowycoder[d]: After the handle pointer there is additional info as described by `NVC0_SU_INFO_*`.
18:40 snowycoder[d]: Right?
18:41 snowycoder[d]: I'm sorry I just know nothing of how an Nvidia GPU really works
18:41 snowycoder[d]: gfxstrand[d]: WHERE, that's pure gold!
18:41 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1359236824001941574/image.png?ex=67f6bf70&is=67f56df0&hm=b118f9c4a8096467b7dc8c25e292d8dea79574033f313f18774fa8b7c60d4312&
18:41 gfxstrand[d]: It's not as helpful as you might think:
18:41 gfxstrand[d]: That's literally all we've got.
18:42 gfxstrand[d]: https://docs.nvidia.com/cuda/archive/10.2/pdf/CUDA_Binary_Utilities.pdf
18:42 gfxstrand[d]: But it at least tells us what they do (sort of) and that's a hell of a lot better than nothing!
18:43 gfxstrand[d]: I'm especially curious about SUBFM
18:43 gfxstrand[d]: Can you paste the sequence you're seeing come out of nvcc?
18:45 gfxstrand[d]: snowycoder[d]: Don't sweat it. I know nothing about how this works on Kepler so we're learning together here.
18:50 snowycoder[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1359238872298684536/hello.cu?ex=67f6c158&is=67f56fd8&hm=6d5b716d1a1b92fb417b81849f7e90ca3ca9cadb61f914c50bbdc1af561db652&
18:50 snowycoder[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1359238872688496911/out.txt?ex=67f6c158&is=67f56fd8&hm=9208854c093c13d1f5c3e371150929244650b4453ccd78137ebebe7746d2f893&
18:50 snowycoder[d]: gfxstrand[d]: that's the most basic in/out I've seen
18:50 snowycoder[d]: (I have a weird mixed toolchain though)
18:50 snowycoder[d]: don't ask me what .sd means (store double?) it seems it can either be SD|PL|BL
18:56 gfxstrand[d]: Assuming `suclamp` doesn't actually reference the surface state (it doesn't look like it does), we should be able to write a unit test to R/E its behavior.
18:58 gfxstrand[d]: I think that's where I'd start.
19:01 gfxstrand[d]: Also, taking a wild stab in the dark here but I think the predicate on `suclamp` is probably whether or not the access is OOB
19:02 gfxstrand[d]: But we should verify that with unit tests, too
19:03 gfxstrand[d]: (To be clear, by "unit test", I mean add the op, implement `Foldable` for it as whatever we guess it is, add the unit test and then tweak the `Folable` implementation until it passes.)
23:30 airlied[d]: it does appear the ur for texture handle needs to be 64-bit ailigned at least