00:28pabs: IIRC power is dual-endian, you can choose your endianness on a per-process basis even :)
00:28pabs: IIRC the RaptorCS Talos workstations can run big-endian Debian ppc64 for eg
02:02HdkR: ARMv8 in the spec lets you do the same thing. You can even have your exception handlers switch endian on the fly
02:05HdkR: Of course no one uses it
03:01airlied: TimurTabi: https://people.freedesktop.org/~airlied/scratch/535.104.05.logs.tgz but I don't think it helps
03:02airlied: I've found the base gsp msg struct was changed and checksum, so the gsp might never get a proper msg
04:06airlied: TimurTabi: https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/nouveau-gsp-535.104.05-wip?ref_type=heads has my wip patch for 535.104.05, still not going though
15:57fdobridge: <benjaminl> what's the newest version of the proprietary drivers that people have successfully used with `demmt`?
15:58fdobridge: <karolherbst🐧🦀> uhhh
15:58fdobridge: <benjaminl> I'm trying with `470.199.02`, but am seeing a lot of unrecognized ioctls, including a `size=40` variant of `NVRM_IOCTL_CREATE`...
15:58fdobridge: <karolherbst🐧🦀> very old
15:58fdobridge: <karolherbst🐧🦀> yeah...
15:58fdobridge: <benjaminl> oof
15:58fdobridge: <karolherbst🐧🦀> so nobody has been keeping it updated
15:58fdobridge: <karolherbst🐧🦀> and nvidia is now also doing userspace command submission
15:58fdobridge: <karolherbst🐧🦀> so... we kinda need an entirely new approach
15:58fdobridge: <karolherbst🐧🦀> the entire compute side of things is also different since a while
15:59fdobridge: <karolherbst🐧🦀> also
15:59fdobridge: <karolherbst🐧🦀> valgrind: https://github.com/karolherbst/valgrind/commit/e29d6ef2b3de297f20c7f52756f3de50ad9461ba
15:59fdobridge: <karolherbst🐧🦀> envytools: https://github.com/karolherbst/envytools/commit/24467a5a8948797c911fdc45f22d7faed1fb8b8c
16:00fdobridge: <karolherbst🐧🦀> looks like I used `390.77` for it
16:02fdobridge: <benjaminl> thanks!
16:02fdobridge: <benjaminl> if it's not too much work to explain, what's uvm?
16:02fdobridge: <benjaminl> I haven't seen that one before
16:02fdobridge: <karolherbst🐧🦀> unified virtual memory
16:02fdobridge: <karolherbst🐧🦀> it's a new set of UAPI nvidia uses in their driver for compute
16:03fdobridge: <karolherbst🐧🦀> and they started to use it for compute shaders as well in OpenGL
16:03fdobridge: <karolherbst🐧🦀> it was mostly a CUDA thing before
16:03fdobridge: <karolherbst🐧🦀> anyway.. adding support for all of this is just major pain
16:04fdobridge: <karolherbst🐧🦀> and I think we kinda need a better approach here
16:04fdobridge: <karolherbst🐧🦀> and hey.. nvidia's driver is now open source anyway, so we might as well just hack it up there and find a solution for userspace command submission
16:05fdobridge: <benjaminl> haha yeah, unfortunately my card is Maxwell, so not supported by nvidia-open
16:05fdobridge: <karolherbst🐧🦀> pain
16:07fdobridge: <benjaminl> I got into this in the first place because I was looking at the issue for NVK conservative rasterization in mesa, and was like "hmm, maybe I can try this and learn how RE for the proprietary driver works"
16:08fdobridge: <benjaminl> it's turning out this one is probably beyond my capabilities with my current knowledge 🙂
20:28fdobridge: <gfxstrand> What do you mean by "pixels on X" and "pixels on Y"?
20:29fdobridge: <karolherbst🐧🦀> raster pixel per thread
20:30fdobridge: <gfxstrand> What do you mean by "per thread"?
20:31fdobridge: <karolherbst🐧🦀> I have no idea 😄
20:31fdobridge: <gfxstrand> 😂
20:31fdobridge: <gfxstrand> Awesome!
20:32fdobridge: <karolherbst🐧🦀> though I think in nvidia speak a thread is a single execution unit
20:32fdobridge: <karolherbst🐧🦀> but they also use the term "thread lanes" sometimes
20:32fdobridge: <karolherbst🐧🦀> but a subgroup is called "warp"
20:32fdobridge: <karolherbst🐧🦀> so I guess a thread is a single thread
20:33fdobridge: <karolherbst🐧🦀> maybe just check what the value is and then we'll see what they mean here
21:05fdobridge: <gfxstrand> So it turns out NVIDIA doesn't have any special tricks for `gl_SampleMask` with per-sample shading. 🙄
21:05fdobridge: <karolherbst🐧🦀> sad
21:06fdobridge: <karolherbst🐧🦀> so what are they doing?
21:06fdobridge: <gfxstrand> `COVMASK & (1 << gl_SampleId)` just like codege
21:06fdobridge: <gfxstrand> `COVMASK & (1 << gl_SampleId)` just like codegen (edited)
21:06fdobridge: <karolherbst🐧🦀> ahh yeah
21:06fdobridge: <gfxstrand> I mean, I could make it conditional on `passcount > 2` or something maybe
21:07fdobridge: <gfxstrand> I mean, I could make it conditional on `passcount > 1` or something maybe (edited)
21:07fdobridge: <karolherbst🐧🦀> mhhh...
21:07fdobridge: <karolherbst🐧🦀> why tho
21:07fdobridge: <gfxstrand> But the driver still has to know to shove passcount to max in that case
21:07fdobridge: <gfxstrand> So there's still API going on
21:07fdobridge: <gfxstrand> Shader key time!
21:07fdobridge: <karolherbst🐧🦀> 😄
21:09fdobridge: <gfxstrand> The million dollar question, though, is what 2 bits do I want in that key?
21:09fdobridge: <gfxstrand> *sigh*
22:01fdobridge: <gfxstrand> Ugh... Looks like the blob doesn't use SamplerPos
22:01fdobridge: <gfxstrand> I'm gonna assume it doesn't do what we think it does...
22:03fdobridge: <gfxstrand> Back to a cbuf it is...
22:03fdobridge: <gfxstrand> 🙄
22:04fdobridge: <karolherbst🐧🦀> ohh... I think I know what it does
22:04fdobridge: <karolherbst🐧🦀> it returns the position of a sample within a MS sample pattern, but I kinda thought that's what we need?
22:05fdobridge: <karolherbst🐧🦀> or uhm...
22:05fdobridge: <karolherbst🐧🦀> maybe not
22:08fdobridge: <karolherbst🐧🦀> sounds like it's for `GL_NV_sample_locations`, specifically `FRAMEBUFFER_PROGRAMMABLE_SAMPLE_LOCATIONS_NV`
22:09fdobridge: <gfxstrand> It looks like it does something real with the texture index
22:09fdobridge: <gfxstrand> So maybe it looks up the number of samples?
22:09fdobridge: <karolherbst🐧🦀> I mean.. the docs say sample index within MS pattern
22:10fdobridge: <karolherbst🐧🦀> as the input
22:10fdobridge: <karolherbst🐧🦀> of the selected texture
22:11fdobridge: <karolherbst🐧🦀> those sample locations are fully programmable, so it kinda makes sense
22:11fdobridge: <gfxstrand> Yeah, but I can't use that on render targets without texture headers for them.
22:11fdobridge: <karolherbst🐧🦀> right..
22:12fdobridge: <karolherbst🐧🦀> the only MS related thing `PIXLD` has is `MY_INDEX`
22:12fdobridge: <karolherbst🐧🦀> and uhm.. `INNER_COVERAGE`
22:12fdobridge: <karolherbst🐧🦀> and others 😄
22:12fdobridge: <gfxstrand> Yeah, inner coverage isn't it
22:12fdobridge: <gfxstrand> What others?
22:13fdobridge: <karolherbst🐧🦀> ohh wait... INNER_COVERAGE says it's independent of MS, so that makes sense
22:13fdobridge: <karolherbst🐧🦀> `MSCOUNT`, `COVMASK`, `MY_INDEX`
22:13fdobridge: <karolherbst🐧🦀> `MY_INDEX` is the current sample index
22:13fdobridge: <karolherbst🐧🦀> heh
22:14fdobridge: <karolherbst🐧🦀> `MY_INDEX` has a fun feature.. it sets the output predicate to true if in MS mode
22:14fdobridge: <gfxstrand> Oh, that is fun...
22:14fdobridge: <gfxstrand> What does "MS mode" mean?
22:15fdobridge: <karolherbst🐧🦀> ehhhh
22:15fdobridge: <gfxstrand> passes > 1
22:15fdobridge: <karolherbst🐧🦀> I misread, it's `SSAA mode`
22:15fdobridge: <gfxstrand> Also, what does PIXLD `78..81` set to 0 do?
22:16fdobridge: <karolherbst🐧🦀> is 78..81 the mode?
22:16fdobridge: <gfxstrand> Yeah
22:16fdobridge: <karolherbst🐧🦀> probably doing `MSCOUNT`
22:16fdobridge: <gfxstrand> It's a valid encoding according to the disassembler
22:16fdobridge: <karolherbst🐧🦀> it drops the mode in the output?
22:16fdobridge: <gfxstrand> Ah, MSCOUNT could be easy
22:16fdobridge: <gfxstrand> For 0
22:16fdobridge: <gfxstrand> The others decode to things
22:17fdobridge: <karolherbst🐧🦀> okay, does anything decode to `MSCOUNT`?
22:17fdobridge: <gfxstrand> No
22:17fdobridge: <karolherbst🐧🦀> okay..
22:17fdobridge: <karolherbst🐧🦀> then 0 is `MSCOUNT` 😄
22:17fdobridge: <karolherbst🐧🦀> it has a star on it meaning default
22:17fdobridge: <gfxstrand> 5, 6, and 7 decode to `???6` etc.
22:17fdobridge: <karolherbst🐧🦀> ahh yeah
22:17fdobridge: <karolherbst🐧🦀> so it's not implemented
22:17fdobridge: <karolherbst🐧🦀> or not existing
22:18fdobridge: <gfxstrand> I figured
22:41fdobridge: <gfxstrand> Okay, now to figure out why alpha-to-coverage doesn't work...
23:09fdobridge: <gfxstrand> Seems to work when there are attachments.
23:10fdobridge: <gfxstrand> Do I need a dummy attachment or something? That would be frustrating.
23:52fdobridge: <gfxstrand> There's `RASTER_SAMPLES` but it doesn't seem to do anything and it complains when I set it to 0 (1x1)
23:52fdobridge: <gfxstrand> I may need to try dumping command buffers from the blob
23:54fdobridge: <gfxstrand> It's just such a giant PITA