00:00 fdobridge: <k​arolherbst🐧🦀> isn't that mme macro just `NVK_MME_SET_PRIV_REG`?
00:01 karolherbst: benjaminl: did you found that mme macro you use from nvidia traces? Just curious
00:01 karolherbst: because that one is just poking an MMIO reg
00:02 karolherbst: and it looks to be too simple from that nvidia generally does when doing that
00:23 benjaminl: hmm, I originally got it from a trace, but looking back I think you're right and the trace is doing something more complicated
00:24 benjaminl: I was also looking at the mme macros from the gallium nouveau drivers, which don't include the loop bit from set_priv_reg, but I think the gallium driver might just be wrong now
00:28 benjaminl: yup, the trace is doing all the same steps that set_priv_reg is. good catch!
00:30 karolherbst: nice
00:30 karolherbst: but I wasn't aware of the gallium driver setting an mmio register already :D
00:31 karolherbst: that code should probably also be moved to the SET_PRIV_REG macro I've added for a different thing there...
00:31 karolherbst: benjaminl: but to be clear: I and probably nobody else outside of nvidia has any idea what that additional code is doing, except maybe wait for the firmware to ack the request
00:31 karolherbst: which is my best bet
00:31 karolherbst: and uhh.. I think it falls back to a busy wait based on some condition
00:32 benjaminl: the WAIT_FOR_IDLE bit seems particularly confusing to me
00:32 karolherbst: but anyway.. good to know that nvidia is using the same macro for other mmio registers
00:33 karolherbst: why?
00:33 karolherbst: it just makes sure nothing is running on the context before flipping that mmio register
00:33 karolherbst: I'm sure that flipping it randomly can mess up state or ongoing work
00:33 benjaminl: ah, that makes sense
00:33 karolherbst: so some of those MMIO registers are context switched privileged state/flags changing behavior
00:34 karolherbst: like the reason I've added that code was to flip a bit to enable memory loads for FP helper invocations
00:34 karolherbst: as you can imagine, doing that mid shader execution could cause some issues.. probably
00:34 benjaminl: another fun bit is that nvidia isn't exactly using the _same_ macro. There's a separate macro for conservative rasterization that has the set_priv_reg code inlined
00:34 benjaminl: with that 0x418800 register hardcoded
00:34 karolherbst: ahh
00:35 karolherbst: and I suspect the bitmask is also hardcoded
00:35 karolherbst: maybe it wins them 5ns of latency somewhere
00:38 benjaminl: here's the bit from gallium: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme?ref_type=heads#L570
00:38 benjaminl: I can make a separate MR to replace that with set_priv_reg if that would be useful
00:40 karolherbst: not sure it's worth the effort, but maybe it fixes some super subtle bug somewhere
00:48 benjaminl: hmm, the nvidia driver is also storing the previous register value in scratch space and skipping the write if it hasn't changed
00:49 karolherbst: interesting.. I didn't dig into how we can actually read out the registers here
00:49 karolherbst: so that would be good to know
00:50 benjaminl: I don't think we have any reason to do that rather than just using dyn->dirty to track it cpu side
00:50 karolherbst: yeah...
00:50 benjaminl: however it probably _is_ worth skipping the write when conservativeMode changes but extraPrimitiveOverestimationSize doesn't
00:50 karolherbst: I question the actual overhead here anyway. Though would be good to have some numbers
00:51 benjaminl: nvidia isn't actually reading the firmware register, it's just stores the value it's writing to the register to a position in the MME scratch memory
00:52 benjaminl: doesn't WAIT_FOR_IDLE cause a stall?
00:52 karolherbst: in theory
00:52 karolherbst: so yeah, I guess skipping the WFI might actually makes sense
00:53 karolherbst: but nvk is atm so bad at it, it won't matter for a while
00:53 benjaminl: it wouldn't be hard to do, just split the if statement in half :)
00:53 karolherbst: yeah...
00:53 benjaminl: there's a theorical loss if somebody updates extraPrimitiveOverestimationSize multiple times while conservativeMode is disabled, but I _really_ don't think that matters
00:55 karolherbst: I think what might make sense is to just add that if to the general SET_PRIV_REG macro and just call into that if we need it elsewhere as this really shouldn't be hard to implement
00:56 karolherbst: I'm just surprised it's not in the general macro...
00:56 karolherbst: I mean.. on nvidia's side
00:58 benjaminl: adding it to the general SET_PRIV_REG macro is harder because it can write to any register
00:58 benjaminl: so you'd have to reserve space for tracking the state of all of them, or figure out how to read the registers directly
00:58 karolherbst: you could also just read the register, and then see if the operation would change the value
00:59 benjaminl: do you know how to read them?
00:59 karolherbst: though I guess reading it would also add more overhead
00:59 karolherbst: ohhh.. I thought you meant that nvidia reads it back, so I guess they just stash the value in scratch instead?
00:59 benjaminl: yeah, exactly
00:59 karolherbst: I see
00:59 benjaminl: sorry, I probably could have been clearer
00:59 karolherbst: mhhh
01:00 karolherbst: could potentially add another argument to the macro which is the old masked value
01:00 karolherbst: and then callers make use of it.. or not
01:15 fdobridge: <g​fxstrand> Gotta love how crucible and the NVIDIA blob driver seem fundamentally incompatible. 🙄 Does on something TLS-related inside the NV driver.
01:15 fdobridge: <k​arolherbst🐧🦀> maybe the threading hack they are doing?
01:16 fdobridge: <k​arolherbst🐧🦀> though not sure how much that's a vulkan thing. I know that their GL driver tries its best to not lock if the application isn't actually doing threaded GL
01:19 fdobridge: <g​fxstrand> Found it. One of my hacks was stomping memory
03:58 fdobridge: <g​fxstrand> So... I'm pretty sure I know why those depth tests are failing and I'm pretty sure it's a hardware bug and I'm pretty sure the blob driver is hacking around the test.
03:58 fdobridge: <g​fxstrand> So the question becomes what do we do
04:03 airlied: hack around the test :)
04:04 fdobridge: <a​irlied> discord me says we should publish a phoronix article accusing nvidia of chearing
04:04 fdobridge: <a​irlied> discord me says we should publish a phoronix article accusing nvidia of cheating (edited)
04:07 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Or emulate that feature in software 😅
04:07 fdobridge: <g​fxstrand> No, I'm not going to emulate depth bias in software...
04:07 fdobridge: <g​fxstrand> I mean, I could but no
04:09 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Not even in a compute shader? NAK's CS support seems to be pretty good
04:10 fdobridge: <g​fxstrand> It's not practical. I mean, I could probably do something in a geometry shader if I really cared.
04:10 fdobridge: <g​fxstrand> But yuck
04:19 fdobridge: <a​irlied> @gfxstrand you could also try and write a better test they can't workaroud
04:25 fdobridge: <g​fxstrand> What's happening, as far as I can tell, is that the NVIDIA clipper clips after depth bias has been applied but the depth bias is technically supposed to be applied post-clipping (or so it is in the Vulkan pipeline). Most of the time, you'd never notice this in an app but this CTS tickles that case and you end up with pixels rendered that shouldn't be if bias is applied too early.
04:25 fdobridge: <g​fxstrand>
04:25 fdobridge: <g​fxstrand> In the traces from @airlied, they're setting bias to 0. If I hack NVK to smash bias to 0, that causes the test to pass but it's not really a valid programming of the hardware given the inputs we're getting int he test.
04:25 fdobridge: <g​fxstrand>
04:25 fdobridge: <g​fxstrand> I wrote a very targeted crucible test today and have been fuzzing things a bit and I still haven't been able to trigger this behavior. IDK what's going on.
04:25 fdobridge: <g​fxstrand> That would be the passive-aggressive solution. 😂
04:25 benjaminl: extort them for docs /s
05:17 fdobridge: <m​henning> maybe the clip planes can be adjusted when we set depth bias?
05:38 fdobridge: <g​fxstrand> I don't think you really can. The slope means you don't know how much bias will actually be applied.
05:41 fdobridge: <m​henning> oh, right. that's annoying
17:15 fdobridge: <g​fxstrand> @airlied what driver version are you running? Piers said there might be a bias regression.
19:15 fdobridge: <g​fxstrand> Okay, so the depth clip thing isn't a hack. There's something funky going on in clip configuration
19:16 fdobridge: <g​fxstrand> They're getting a bias, they're just not clipping post-bias
19:16 fdobridge: <g​fxstrand> On both NVIDIA and ANV, I see Z = 0.65050739 unclipped when the Z range is set to [0.35, 0.65].
19:17 fdobridge: <g​fxstrand> With NVK, that pixel gets clipped
19:17 fdobridge: <g​fxstrand> If I move one pixel over, I see the same value in all three drivers: Z = 0.648462653
19:17 fdobridge: <g​fxstrand> So... something wrong with my Z clip settings? 🤷🏻‍♀️
19:18 fdobridge: <g​fxstrand> Maybe there's another chicken bit we're missing?
19:18 fdobridge: <k​arolherbst🐧🦀> maybe another magic mmio reg?
19:18 fdobridge: <k​arolherbst🐧🦀> might make sense to scan for those
19:35 fdobridge: <a​irlied> driverVersion = 535.104.5.320 (2245656896)
20:43 uis: I don't remember if I asked it here, but why following xorg on mastodon needs their approval?
20:43 karolherbst: prolly because of bad people
20:56 uis: Huh. Apparently I can follow from another instance just fine
21:00 karolherbst: yeah.. because some bad people on your main instance
21:01 karolherbst: which one did you try from at first?
21:01 uis: pone.social
21:04 karolherbst: ahh.. a free speech zone, yeah, that will get that instance silenced real quick
21:05 karolherbst: also see some fediblock requests against that instance
21:15 uis: It is 20-something people instance...
21:17 uis: Alzo I remember seeing in Silk project that there was something about instance misconfiguration, but it was fixed.
21:23 karolherbst: it apparently has 80 users
21:23 karolherbst: but "free-spech"-zone is a "free-speech"-zone and the server rules have multiple red flags
21:23 uis: *active people
21:24 karolherbst: so I'm not at all surprised that somebody caused problems and the word got around
21:24 karolherbst: I don't know what happened
21:24 karolherbst: but looking at the server I'm not surprised it's silenced/restricted (which generally is the reason why follower requests needs to be acked)
21:43 uis: Didn't expect that anyone would block pony-themed instance
21:44 karolherbst: the issue isn't that it's pony themed
21:45 karolherbst: when a server has "This is not a safe space; you may be offended by the things you see here; we will not regulate anyone's speech" in its rules, you have to expect that this server gets blocked on sight as it the rights of any instance admin
21:45 karolherbst: that's all
21:46 karolherbst: and I don't even disagree with it