00:15pabs:wonders where fdobridge is bridging to
00:20airlied: pabs: discord
00:20karolherbst: there are no mems there
00:20karolherbst: *memes
00:41pabs:ew
00:43karolherbst: yeah..... atm just trying out various things and to know what's out there, but I have the feeling that if fdo switches over to something different than IRC, it might be zulip
00:43karolherbst: it's saw how evertyhing sucks in their own way
00:44karolherbst: *sad
00:45karolherbst: not gonna lie, discord is the best platform from a feature perspective and the only downside is it's not FOSS and you can't host it yourself
00:45airlied: and they ban people very quick
00:45karolherbst: evertyhing else is just sad in comparison
00:45karolherbst: yeah.. that's a "can't host it yourself" problem though
00:46airlied: I count it as a major negative if your contributors can't use the service no matter who hosts it
00:46karolherbst: sometimes thinking about pulling the bridges, but some users won't use IRC
00:46i509VCB: There was a self hosted alternative to discord I recall, can't remember the name of head
00:46karolherbst: there are a few
00:46karolherbst: but zulip seems to be the closest one
00:46karolherbst: didn't try it out though
00:46karolherbst: that's what I plan for the weekend
00:47i509VCB: Not zulip
00:47airlied: like it's inconvenient when you can't chat to your gamer friends, but when you can't do work because a platform bans you it's not a suitable platform for working on
00:48karolherbst: yeah, I don't disagree
00:48karolherbst: just saying it's the best alternative if it wouldn't be for that
00:48airlied: I thought matrix was the new irc
00:48karolherbst: ewww
00:49i509VCB: Revolt was what I was thinking of
00:49karolherbst: matrix is like IRC just persistent messages
00:49karolherbst: ahhh
00:50karolherbst: i509VCB: revolt can't do oauth logins, can it?
00:51i509VCB: No idea, I haven't been involved with the group that experimented with in a while
00:51i509VCB: I wouldn't mind an fdo discourse instance to be fair
00:51karolherbst: yeah.... I kind of want to require oauth so we can plug in our fdo gitlab server there
00:52karolherbst: I think discourse is a bit too much
00:52karolherbst: discourse isn't really a chat plattform
00:53karolherbst: zulip has this interesting mix of being mainly a chat, but also allows you do to more threading type of conversation
00:53karolherbst: also I'd like to have video/screensharing integration for extra lols
00:53karolherbst: like what if you want to get together with 2 or 3 others and do some coding together
00:53i509VCB: The druid folks live on the xi editor zulip and it seems to work fine for them
00:54karolherbst: or want to give a quick ad hoc presentation outside of XDCs timeline
00:54karolherbst: mattermost would be also an idea, but that's also a bit too much
00:55karolherbst: I'd consider revolt if we could login via gitlab
00:55i509VCB: Yeah the problem is discord vertically integrated all those features and we have to use irc, jitsi and then xdc
00:55karolherbst: but it doesn't seem to support that
00:55i509VCB: karolherbst: I checked there is an oauth issue but no oauth
00:55karolherbst: yeah..
00:55karolherbst: my feeling says that it probably will be zulip
00:55i509VCB: Rust backend though :)
00:56karolherbst: right
00:57karolherbst: anyway, my plan is to setup a zulip server locally and if I think it's worth a shot try to host it somewhere and do the same as on discord atm. At least we can use the same bridge as it also supports zulip
00:59karolherbst: ohhhhh wait
01:00karolherbst: zulips hosting is free for open source projects.. interesting...
01:01karolherbst: it usually costs $6.67 per user per month.. wondering what they say if we come around with like 1000 of users :D
01:12i509VCB: Probably less expensive than Rust's zulip still
01:14karolherbst: ehhh.. mhh.. maybe zulip isn't it in the end...
01:15i509VCB: I wonder when "fine we will write it ourselves" will appear as an option
01:15karolherbst: uhm....
01:16karolherbst: I guess I'd rather add support for gitlab oauth to Revolt
01:19airlied: karolherbst: you know about address space limits?
01:19karolherbst: mhh? in which sense you mena?
01:19karolherbst: *mean
01:19airlied: like what parts of the hw have bitsize limits
01:20karolherbst: ahh.. yeah
01:20karolherbst: I think push buffers have some weirdo 40 bit limit
01:21airlied: seeing a compute test fail if I allocate from the top of 40-bit space, but passes if I set vma to alloc from bottom
01:21karolherbst: but besides that not really. Normally it's all 48 bit
01:21karolherbst: airlied: yeah.. try to only allocate push buffers below 40 bit
01:21karolherbst: that might just fix it
01:22airlied:thought I was doing everything below 40
01:22karolherbst: could be the limit was sometihng else
01:22airlied: yeah wondering if the qmd stuff has other limits
01:22karolherbst: but I think skeggsb_ said something about them being special
01:22karolherbst: mhhh
01:22karolherbst: potentially?
01:23karolherbst: maybe some alignment problem instead?
01:23karolherbst: does the hardware complain? and how?
01:23airlied: yeah it could be that as well
01:23airlied: lots of traps with MULTIPLE_WARP_ERRORS
01:23karolherbst: okay..
01:23karolherbst: so the shader is running
01:24karolherbst: well..
01:24karolherbst: but what kind of warp errors?
01:24airlied: https://paste.centos.org/view/03f772b1 is an example
01:25airlied: I was hoping we'd just messed up another mask, but can't spot it :-P
01:25karolherbst: looks like images are busted
01:25karolherbst: at least the TPC complains
01:26karolherbst: airlied: what about compute shaders not doing image related things?
01:27airlied: test has ssbo only I think
01:27airlied: dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.load.store.multi.std140.comp
01:27karolherbst: depth?
01:27karolherbst: could dump the nir
01:29airlied: oh it does indeed have an image
01:29karolherbst: yeah.. would be surprised why the TPC complains otherwise
01:30karolherbst: but would also be nice if we'd know why it complains
01:41karolherbst: airlied: anyway, my bet would be on badly alligned images :p
01:41karolherbst: also.. sleep
02:11airlied: jekstrand: you understand dma fence errors? does userspace get notified via syncobj wait if a fence is signalled in error?
02:11airlied: or how is userspace expected to notice that sort of failure?
02:22airlied: dakr: ^ was asking about it
02:32dakr: airlied, jekstrand: afaics syncobj don't consider dma_fence_get_status(), hence if a fence gets signaled with fence->error being set the syncobj wait ioctl() should still return 0
03:07airlied: karolherbst: I think it's a bug around bindless image address and sustp
03:07airlied: so probably in codegen
03:21airlied: jekstrand: I think there's some bug with bindless images address > 32 bits
03:22airlied: might be in descriptor lowering, but I don't totally understand it all yet
03:33airlied: hmm might not be that either
03:33airlied: definitely a 32-bit limit somewhere in buffer or image encoding
07:24airlied: Pass: 223283, Fail: 4841, Crash: 867, Warn: 4, Skip: 1394954, Timeout: 3, Flake: 13304, Duration: 3:16:35, Remaining: 0
07:24airlied: seems better with alloc vma space from lower bounds
09:25fdobridge: <kāarolherbstš§š¦> I suspect codegen has terrible 64 ptr handling bugs
09:25fdobridge: <kāarolherbstš§š¦> normally we only end up with ptrs below 32 bit
14:15fdobridge: <jāekstrand> fence errors are evil. We shouldn't rely on them.
14:19dakr: jekstrand, In nouveau, when a channel faults, the fence is signaled (and fence->error is set) when the fence context is killed. In such a case a syncobj wait ioctl would return with 0. How would userspace know it can't trust the result of the exec job without considering the fence error?
14:27fdobridge: <jāekstrand> It should be looking for any sort of device loss event which can be communicated in other ways.
14:28fdobridge: <jāekstrand> Fence errors are mostly ok if they remain within a single context but the moment you start leaking them around, things get tricky.
14:29fdobridge: <jāekstrand> We had problems with i915 for a while where a hang in a client app would kill the X server because we were trying to preserve fence errors.
14:29fdobridge: <jāekstrand> So maybe "evil" was a bit strong but we shouldn't rely on them as the only mechanism and we need to be very careful how we let them propagate.
14:30fdobridge: <jāekstrand> From a userspace driver POV, we need some sort of very fast query for "is my channel still alive?" which gets checked periodically.
14:33dakr: jekstrand, the only other way to query it would probably be another ioctl() to check after the syncobj wait ioctl returns..
14:35dakr: actually, I probably wouldn't consider the fence->error as return value for the syncobj wait ioctl, but put it into struct drm_syncobj_wait and struct drm_syncobj_timeline_wait, both have a 32bit padding field at the end which could be used..
14:35fdobridge: <jāekstrand> Yup, it means another ioctl.
14:35fdobridge: <jāekstrand> But that's important because when you do syncobj wait, you have no idea where that fence came from. It may have come from your context or some entirely unrelated context.
14:36fdobridge: <jāekstrand> You don't want to kill your context just because some unrelated context hung
14:36fdobridge: <kāarolherbstš§š¦> we kind of already have that ioctl, it's just very stupid to use
14:36fdobridge: <jāekstrand> No, dummy submit is not that ioctl
14:36fdobridge: <kāarolherbstš§š¦> I know, that's why I said stupid
14:44dakr: jekstrand, for a certain uapi we should know the context the syncobj's fence came from and hence what the error code means.
14:53fdobridge: <jāekstrand> If you assume that the kernel wraps every fence it receives in another contex-specific fence before sticking it on the syncobj, yes. That's an implementation detail, not something userspace should assume.
14:54fdobridge: <jāekstrand> Especially when you start sharing synchronization primitives across processes which is what drm_syncobj is designed for.
15:26fdobridge: <jāekstrand> In particular, if we have an imported semaphore or fence, we can't tell the difference between one that was exported from our context and re-imported vs. something that came from some other context.
15:26fdobridge: <jāekstrand> We could, in theory, track all the semaphores which have never and will never be shared and trust the error codes coming from them and ignore everything that might be shared.
15:27fdobridge: <jāekstrand> If we need to do that for a perf reason, I guess we could. Seems easier to just ioctl every once in a while, though.
15:53dakr: jekstrand, sure, that's what I mean, it would only be meaningful for syncobjs that were directly attached to an operation via a driver specific ioctl, where the result is defined for this particular uapi. for all other cases, where a syncobj is fed into a chain of operations, userspace can't detect if all of the chain's operations were successful or not anyways.
15:56dakr: but yeah, another ioctl would probably be more obvious.
16:01dakr: though, the result of the ioctl wouldn't necessarily relate to the job that was signaled to be finished, since in between the ioctls another queued job could have caused the channel fault.
16:08fdobridge: <jāekstrand> Sure, but if a queued thing faulted, it may have also scribbled over memory used by the thing you're waiting on meaning that you can't trust those results either, regardless of whether or not it was that job that faulted.
17:26dakr: jekstrand, yeah, true. then I'll probably just add another ioctl. though as karolherbst already mentioned, a dummy exec ioctl would return ENODEV too if the channel is dead.
17:28dakr: airlied, I just pushed split/merge support if you're curious, it's still wip and only tested against not breaking existing functionality though: https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/wip-split-merge
17:30jekstrand: dakr: Yeah, but a dummy exec requires you actually have something to submit...
17:31jekstrand: Speaking of... airlied: The new submit ioctl shoudl allow zero pushbufs.
17:32dakr: jekstrand, it does allow it.
17:32jekstrand: Oh, neat. Then why are we submitting a pushbuf... š¤ Maybe because we're still depending on implicit sync for our explicit sync?
17:36karolherbst: jekstrand: it still returns ENODEV if the channel is dead
17:37karolherbst: it's the first check
17:37jekstrand: Ok, that's not quite as bad then
17:42jekstrand: Maybe I'll poke at modifiers again today...
17:46jekstrand: And now I'm remembering why I gave up, lol.
17:46jekstrand: linear....
18:04HdkR: Who needs any modifier beyond linear? :P
18:04jekstrand: Linear is the problem. :P
18:04jekstrand: NV can't do linear
18:09HdkR: whaaaa
18:10karolherbst: well.. it can, just not when rendering depth/stencil
18:10jekstrand: More specifically, it can't render to linear if you also have a Z/S buffer.
18:10jekstrand: Which means that supporting the linear modifier is going to be "fun"
18:10jekstrand: We can do literally everything else with it.
18:10jekstrand: One option is to allocate a tiled shadow buffer, render to that, and then copy at the end of the render pass.
18:11jekstrand: Or we can kick that to the WSI present code somehow.
18:12jekstrand: But the more immediate problem is that I need to figure out why my system GL is giving me bogus modifiers. :-/
18:12karolherbst: probably because it's weirdly broken since forever
18:12karolherbst: just works well enough
18:12jekstrand: that's not a real explanation
18:26jekstrand: Of course using the piglit test returns a different set of modifiers than either Vulkan or X11
18:30jekstrand: ugh... I bet the bogus modifiers are coming from KMS.