IRC Logs of #nouveau on irc.freenode.net for 2024-05-23

00:07 fdobridge: <redsheep> Does this segfaulting here mean it was told to wait on a fence that doesn't exist? I am not seeing much else that is useful info poking around the GDB commands
00:29 fdobridge: <mhenning> My first guess would be that the array indices are out of bounds if it's segfaulting on that line
00:34 fdobridge: <mhenning> It might be useful to upload the full stack trace for that. In gdb, that's `set logging on` followed by `bt full`, which will then save the full backtrace to a file (iirc called gdb.txt in the current directory) that you can attach to the bug report
00:36 fdobridge: <redsheep> Would that yield something more than the stack trace I attached on the issue? https://gitlab.freedesktop.org/mesa/mesa/-/issues/11161
00:38 fdobridge: <mhenning> Yes, it will have a lot more detail than what you've already posted
00:46 fdobridge: <redsheep> Thanks, posted that to the issue. Hopefully it helps.
00:49 airlied: waitSemaphoreCount = 1, pWaitSemaphores = 0x0 seems suboptimal :-P
00:50 fdobridge: <redsheep> Oh. Yeah lol that's odd
00:54 fdobridge: <airlied> also seems like it should be impossible
00:56 fdobridge: <airlied> is that a release or debug build, my reading is there's an assert that should fire first
00:57 fdobridge: <redsheep> It is a debug unstripped build with lto on for mesa main at cdf75e8e02cdf68679a18db0a9141d75d162f047
00:57 fdobridge: <redsheep> So, the very latest
00:57 airlied: might be an lto bug
00:58 fdobridge: <redsheep> I can try shutting it off, I suppose
00:58 fdobridge: <airlied> would be a useful datapoint, I'm not seeing this in fedora plasma desktop
00:58 fdobridge: <redsheep> Ok 👍 new build incoming
01:03 fdobridge: <redsheep> Oh by the way not seeing it immediately doesn't mean a whole lot. I've gone 3-4 hours without the crash before, and then had it crash like 4 times in 5 minutes
01:03 fdobridge: <redsheep> Not really much rhyme or reason, sometimes I crash when doing nothing at all
01:04 fdobridge: <redsheep> Out of curiosity are you able to replicate not getting a good Wayland session on plasma fedpre desktop?
01:05 fdobridge: <redsheep> Fedora*
01:06 fdobridge: <mhenning> It looks like that queue can be disabled with ZINK_DEBUG=flushsync, so I'd also be curious if it still crashes with that option (after you test wilth lto off)
01:07 fdobridge: <airlied> I've only done the run sddm, login, play around, leave it running firefox on my laptop
01:07 fdobridge: <redsheep> Ok I am booted into a !lto build now and I already have discord corruption but I haven't crashed my session just yet
01:08 fdobridge: <redsheep> I can kind of try to provoke it by just really aggressively opening things
01:08 fdobridge: <redsheep> Ah, flicker is still here too so that one is also not lto
01:12 fdobridge: <redsheep> Ok I got a fresh crash, let me verify the backtrace has the same issue and then I will test ZINK_DEBUG=flushsync
01:13 fdobridge: <redsheep> Yes it still broke the same way, so that rules out LTO
01:19 fdobridge: <airlied> try reverting 738fbddca8a1d8343e2ae322299de22a9ae108ae
01:21 fdobridge: <redsheep> Hmm I just felt my session freeze for a couple seconds like it was crashing, but then it didn't.
01:22 fdobridge: <redsheep> Oh, weird, it timed out in dmesg. That could just be something else, I was really stressing it.
01:22 fdobridge: <redsheep>
01:22 fdobridge: <redsheep> ```nouveau 0000:01:00.0: Render thread[79935]: job timeout, channel 336 killed!```
01:24 fdobridge: <redsheep> Just happened again. So ZINK_DEBUG=flushsync is leading to channel killed quite a bit, but flicker and full on session crashes appear to be gone.
01:26 fdobridge: <redsheep> With the way I am testing I think reverting a commit would be quite tricky. Maybe I can aim my PKGBUILD at a local branch? Never done by installed build against something that isn't on the gitlab
01:29 fdobridge: <mhenning> @airlied Oh, yeah that looks wrong - I don't see enough locking to call slab_alloc_st/slab_free_st on different threads like that
01:33 fdobridge: <airlied> summon the @zmike.
01:39 fdobridge: <zmike.> WHO DARES SUMMON ME
01:41 fdobridge: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29338
01:41 fdobridge: <airlied> there is only zuul
01:41 fdobridge: <redsheep> Do you want me to test reverting the commit, or the branch from that MR now?
01:42 fdobridge: <airlied> either should prove if it's a problem
01:43 fdobridge: <redsheep> Oh flicker is not cured by ZINK_DEBUG=flushsync after all, interesting. Ok building that branch now.
01:50 fdobridge: <redsheep> Ok builds done, going back to no zink debug variable, let's see what's what
01:52 fdobridge: <zmike.> if ZINK_DEBUG=flushsync didn't fix it then that MR won't have any effect
01:53 fdobridge: <redsheep> It fixed sessions crashing, just not flicker or chromium corruption
01:56 fdobridge: <redsheep> That appears to have cured the crashing, both segfaults and channel killed
01:57 fdobridge: <redsheep> I will keep testing to let you know if it changes, but I think it's likely safe to add closes !11161
01:58 fdobridge: <redsheep> I opened Kate like 30 times, and dolphin probably 80 times without either issue occurring
02:10 fdobridge: <redsheep> I am having a very difficult time replicating flicker with that branch too. I think there's a very solid chance that MR closes 11162 as well
02:12 fdobridge: <redsheep> It didn't fix the plasma Wayland session but still, with flicker and crashing gone I would say that at least the plasma x11 session is now better than the nvc0 session.
02:22 fdobridge: <redsheep> I am really happy, aside from visual chromium stuff (which is usually just a minor annoyance) this session is now really really solid, I just can't break find a way to break it. Thanks for the fix @airlied
02:36 fdobridge: <redsheep> Ok I have managed to get flicker to reappear but it is comparatively very rare now, for whatever reason. I will retest again once it merges and then close the issue, just to make sure the revert works just as well
06:31 fdobridge: <djdeath3483> @gfxstrand looks like the weak_ref pipeline cache makes implementation of EXT_shader_module_identifier interesting
06:32 fdobridge: <djdeath3483> @gfxstrand some of the CTS tests create a shadermodule, get the identifier, destroy the module and expect the pipeline creation that follows to find the spirv/nir
06:38 fdobridge: <djdeath3483> ah... we're not handling this case correctly...
06:38 fdobridge: <djdeath3483> dammit
06:38 fdobridge: <djdeath3483> will need to move to the common pipeline code at some point...
06:41 fdobridge: <!DodoNVK (she) 🇱🇹> Commonized RADV/ANV would be an interesting benchmark for the common Vulkan runtime :triangle_nvk:
08:54 fdobridge: <triang3l> But first commonized vs. non-commonized
12:22 fdobridge: <zmike.> I don't understand why I have rust build issues literally every time I try to update my nvk build
12:22 fdobridge: <zmike.> how is the tooling this bad
12:29 fdobridge: <gfxstrand> The common pipeline code might be busted, too. What's the bug?
12:39 fdobridge: <!DodoNVK (she) 🇱🇹> So when will it have no bugs (just like zink)? 😅
12:47 fdobridge: <esdrastarsis> Any progress with the wayland issue?
12:57 fdobridge: <djdeath3483> No seems good actually, returning COMPILE_REQUIRED_EXT
13:03 fdobridge: <gfxstrand> Oh, Zink has bugs...
13:11 fdobridge: <zmike.> sounds like propaganda
13:33 fdobridge: <gfxstrand> Yeah, we still have some sort of fencing issue. I seem to hit it the more NVKs I have trying to talk to each other. I don't think it's NVK's fault, though, because the kernel APIs we're using are supposed to be robust.
13:42 fdobridge: <redsheep> Yeah I did get that to happen one more time after the revert for the seg faults. At least they're not common, and don't usually blow up nearly as much as the seg faults did.
13:44 fdobridge: <!DodoNVK (she) 🇱🇹> Could NVK on Kepler expose Vulkan 1.3 for applications that don't use scalarBlockLayout (like DXVK)? Vulkan conformance won't probably be possible but at least some modern stuff will work :nope_gears:
13:46 fdobridge: <redsheep> Are you referring to the blank plasma Wayland session? That's still just as it was, I retested.
13:47 fdobridge: <esdrastarsis> This issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11166
13:50 fdobridge: <redsheep> Oh that. I wonder if the two are related.
14:07 fdobridge: <gfxstrand> Kepler doesn't even support storage images right now
14:07 fdobridge: <gfxstrand> And it will never be able to do the memory model
16:26 fdobridge: <gfxstrand> @karolherbst For bindless cbufs, is the 14 bits of size size/4 - 1 or size/4?
16:26 fdobridge: <gfxstrand> And it it's size/4, I wonder how 2^16B UBOs work.
16:27 fdobridge: <gfxstrand> If it's size/4-1, I wonder how null descriptors work.
16:27 fdobridge: <karolherbst🐧🦀> why do you think that the size is just 14 bits?
16:27 fdobridge: <gfxstrand> And if it's size/4, I wonder how 2^16B UBOs work. (edited)
16:27 fdobridge: <gfxstrand> It's a 64-bit value. I thought it was 40+14
16:27 fdobridge: <karolherbst🐧🦀> it's 19 imho
16:27 fdobridge: <gfxstrand> Oh...
16:28 fdobridge: <gfxstrand> So the address is shifted, too?
16:28 fdobridge: <karolherbst🐧🦀> last 4 bits are truncated
16:28 fdobridge: <gfxstrand> That would make sense, I guess
16:28 fdobridge: <karolherbst🐧🦀> the address is 16 byte aligned then
16:28 fdobridge: <gfxstrand> Right
16:29 fdobridge: <karolherbst🐧🦀> there is one thing you need to be careful about
16:29 fdobridge: <karolherbst🐧🦀> RZ + unsigned 16 bit immediate, but Ra + _signed_ 16 bit immediate
16:29 fdobridge: <karolherbst🐧🦀> (so if you set a non zero reg offset, the constant offset turns signed)
16:30 fdobridge: <gfxstrand> Yeah, that's fine
16:30 fdobridge: <karolherbst🐧🦀> but I think that's normal cbuf behavior?
16:30 fdobridge: <gfxstrand> I'm more concerned with the handle right now
16:30 fdobridge: <karolherbst🐧🦀> I don't think the size is shifted
16:30 fdobridge: <karolherbst🐧🦀> it just has an alignment req
16:30 fdobridge: <gfxstrand> So it's 36 bits of addr>>4 and 18 bits of size?
16:31 fdobridge: <karolherbst🐧🦀> 45 bits of the addr
16:31 fdobridge: <karolherbst🐧🦀> all bits are used
16:31 fdobridge: <gfxstrand> Wait, what?
16:31 fdobridge: <gfxstrand> Can you give me the actual layout?
16:31 fdobridge: <gfxstrand> I think we're failing at descriptions here
16:31 fdobridge: <karolherbst🐧🦀> it's 19 MSB for the size, and 45 LSB for the address
16:32 fdobridge: <gfxstrand> Right, okay
16:32 fdobridge: <gfxstrand> That does add to 64
16:32 fdobridge: <gfxstrand> I've been mathing wrong
16:32 fdobridge: <gfxstrand> And is the address shifted or is it just required to be aligned to 16B?
16:33 fdobridge: <karolherbst🐧🦀> the lower bits are truncated for the address and the size needs to be 16 byte aligned
16:33 fdobridge: <gfxstrand> What do you mean "truncated"?
16:33 fdobridge: <karolherbst🐧🦀> ignored probably
16:33 fdobridge: <gfxstrand> Okay but it's not shifted
16:33 fdobridge: <karolherbst🐧🦀> not to my knowledge
16:33 fdobridge: <gfxstrand> Okay
16:34 fdobridge: <karolherbst🐧🦀> even though the size is 19 bits, it's capped at 64k
16:34 fdobridge: <karolherbst🐧🦀> well
16:34 fdobridge: <karolherbst🐧🦀> not sure if it's capped or what happens if you set a size too big
16:35 fdobridge: <gfxstrand> Yeah, that's fine
16:35 fdobridge: <karolherbst🐧🦀> also.. you can only use the indexed addressing mode, none of those fancy ones
16:35 fdobridge: <karolherbst🐧🦀> (for `LDC` e.g.)
16:36 fdobridge: <gfxstrand> Yeah, that's fine
16:39 fdobridge: <redsheep> Hmm I wonder if it's 45 because that's the number of address bits the hardware is wired for... That's in the ballpark
16:42 fdobridge: <redsheep> Thats a 35 TB address space, sounds about right
16:43 fdobridge: <nanokatze> the hw does 48 bits of virtual
16:43 fdobridge: <nanokatze> or 40
16:43 fdobridge: <nanokatze> I forgor but I observed that long ago by allocating carefully sized sparse buffers on prop
16:43 fdobridge: <nanokatze> until oom
18:06 fdobridge: <gfxstrand> 40 but 45 gives room for expansion
18:07 fdobridge: <gfxstrand> Intel is 48 to match the CPU
18:15 HdkR: level-5 57-bit paging called, it wants to map all the GPUs over a PCIe fabric
18:21 airlied: cache coherency called and wants it's page table back
18:22 HdkR: :D
18:27 fdobridge: <mohamexiety> CXL my beloved
18:53 fdobridge: <djdeath3483> 57bits on Xe2 I think
18:54 fdobridge: <dadschoorse> to match newer cpus?
18:55 HdkR: Would have to be, otherwise wouldn't make much sense to go that high
18:55 HdkR: For all those datacenter Xe2 GPUs
18:58 fdobridge: <babblebones> Forgive me the 8billionth ask, but any progress on displayport audio for nouveau?
19:22 fdobridge: <djdeath3483> Actually it was pontoveccio
20:39 fdobridge: <gfxstrand> Yeah, there were some where they were doing more bits
21:26 fdobridge: <redsheep> I guess 57 bits makes sense if you want an entire exascale supercomputer all in one address space
21:27 fdobridge: <redsheep> 45 is already quite a lot though for a GPU, unless you want to take Jensen at his word that an entire DGX supercomputer is a GPU lol
22:49 fdobridge: <gfxstrand> Ugh... This uniform hardware is so dumb....
22:59 fdobridge: <redsheep> What part about it is dumb?
23:18 fdobridge: <gfxstrand> https://mastodon.gamedev.place/@gfxstrand/112492858884532674
23:19 HdkR: The uniform stuff is fancy like that :D
23:22 fdobridge: <karolherbst🐧🦀> @gfxstrand there is a uniform version of f2f?
23:23 fdobridge: <karolherbst🐧🦀> `That means if you mix float and integer at all, you're giong to end up going to the full wave.` that's not true
23:23 fdobridge: <karolherbst🐧🦀> ohh wait.. uhm...
23:23 fdobridge: <karolherbst🐧🦀> nvm me
23:25 fdobridge: <karolherbst🐧🦀> `Also, there's no encoding for uniform predicates being used as actual instruction predicates as far as I know. This means control-flow can't take them.` any U* instruction uses uniform predicates
23:27 fdobridge: <karolherbst🐧🦀> but any normal instruction can use a uniform reg as its sources
23:27 fdobridge: <karolherbst🐧🦀> uhm.. source
23:31 fdobridge: <redsheep> I don't fully understand, but I can say I am not at all surprised the uniform hardware is really limited. The entire idea IIUC is to use as little silicon as possible to keep the more capable hardware more active. Also, I believe Ampere made it way more capable again to increase fp32 throughput
23:33 fdobridge: <redsheep> Turing should be the only generation where all this is integer only, I believe
23:40 fdobridge: <gfxstrand> Right. Yeah, I would expect predicating a uniform op to be uniform
23:40 fdobridge: <gfxstrand> I haven't fuzzed predicates
23:42 fdobridge: <gfxstrand> Yeah, so predicates on uniform ops are uniform which makes sense
23:42 fdobridge: <gfxstrand> But I can't plumb a UP into a SEL and that's really annoying
23:43 fdobridge: <karolherbst🐧🦀> mhhh
23:44 fdobridge: <gfxstrand> They pretty much exist for USEL and predicating other uniform ops
23:45 fdobridge: <karolherbst🐧🦀> yeah...
23:45 fdobridge: <karolherbst🐧🦀> I wouldn't be surprised if nvidia implements it as some postprocessing opt just converting things if it doesn't add pointless overhead
23:47 fdobridge: <gfxstrand> That's a valid approach, TBH.
23:47 fdobridge: <gfxstrand> There's only one thing that requires UGPRs and that's cbufs
23:50 fdobridge: <karolherbst🐧🦀> but thanks for pointing out R2UR... I knew it existed but I entirely forgot the name :ferrisUpsideDown:
23:50 fdobridge: <karolherbst🐧🦀> however
23:50 fdobridge: <karolherbst🐧🦀> R2UR is _super_ funky
23:51 fdobridge: <karolherbst🐧🦀> do you want to figure it out? Because uhm... it's quite the instruction allowing probably funky things
23:59 fdobridge: <gfxstrand> I've got it doing the one thing I need right now
23:59 fdobridge: <gfxstrand> I'll figure out more as I need it