01:15 imirkin: karolherbst: CUDA docs can say whatever they want... https://github.com/envytools/envytools/blob/master/envydis/g80.c#L178 -- check what that flag enables. G200 doesn't have it.
01:16 imirkin: although now that i look, that's actually just the texprep (for cube arrays), texquerylod, and texgather
01:16 karolherbst: yeah
01:16 karolherbst: cuda says there is integer atomics on shared mem and the likes
01:16 imirkin: "sm12" is what includes the shared atomics and such
01:16 karolherbst: tex stuff could be missing, dunno
01:16 imirkin: which is g200+
01:16 karolherbst: g200 is sm13
01:16 imirkin: wtvr
01:16 imirkin: in the envydis parlance
01:17 imirkin: the sm12 flag
01:17 karolherbst: right.. I think the only relevant difference should be fp64 though
01:17 karolherbst: not sure what's up with the tex stuff
01:17 imirkin: yeah, fp64 is a separate flag
01:17 imirkin: (and only on g200)
01:18 karolherbst: I might still give it a try and see what's the difference is
01:18 mwk: hm
01:18 mwk: what's the question?
01:18 imirkin: none
01:19 imirkin: oh, someone had a question for you about how to build the docs
01:19 mwk: that's handled already
01:19 imirkin: ah ok
01:20 karolherbst: imirkin: but... wikipedia does claim dx 10.0 for gt200 and 10.1 for gt215+
01:21 imirkin: correct
01:21 imirkin: the shared atomics have nothing to do with DX 10.1
01:21 karolherbst: maybe they didn't add that texture stuff to gt200?
01:21 imirkin: correct.
08:25 airlied: skeggsb_: https://lore.kernel.org/dri-devel/20220502163722.3957-1-christian.koenig@amd.com/T/#t fyi
17:22 karolherbst: anholt: btw, I am trying to get https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12189 merged. this should speed up testing a lot when channels are dying
17:23 anholt: we spin loop waiting for fences?
17:23 karolherbst: ......
17:23 karolherbst: yes.....
17:23 anholt: /o\
17:23 karolherbst: it's all terrible
17:23 karolherbst: anholt: Ben reworks the entire UAPI though
17:24 karolherbst: and patches should be out soonish
17:24 karolherbst: but that doesn't help with older kernels
17:24 karolherbst: anyway
17:24 karolherbst: I will work on reliability issues on nouveau over the next few weeks
17:24 karolherbst: on my CL CTS script
17:24 karolherbst: without: Pass 44 Fails 10 Crashes 5 Timeouts 0: 3%|█████ | 59/2352 [12:12<7:54:46, 12.42s/it]
17:24 karolherbst: with: Pass 131 Fails 93 Crashes 146 Timeouts 0: 16%|███████████████████████████████▍ | 369/2352 [01:24<03:51, 8.55it/s]
17:24 karolherbst: channels are crashing left and right
17:24 karolherbst: but...
17:25 karolherbst: it progresses way faster
17:25 karolherbst: main reason for me to get it merged is, that sometimes userspace locks up in a "display frozen" way until we stop waiting on fences
17:26 karolherbst: honestly.. writing a new mesa driver for nouveau wouldn't be the worst idea...
17:27 anholt: why aren't we just using CPU_PREP ioctl?
17:27 karolherbst: we do
17:27 anholt: for these waits
17:27 karolherbst: that happens inside the busy loop
17:27 karolherbst: ehh wait
17:27 karolherbst: where do we actually call it
17:28 karolherbst: nouveau_bo_wait was it
17:28 karolherbst: but we still do fencing in userspace
17:28 karolherbst: so waiting on bos is "fine", but if you wait on a fence it isn't
17:29 karolherbst: nouveau_fence_update might call bo_wait though.. let me check
17:29 karolherbst: ahh no, it doesn't
17:29 karolherbst: it's all terrible
17:29 anholt: given the uabi you have, a gallium fence should be basically just a ref of a BO, and waiting for the fence should be a CPU_PREP on the BO.
17:29 karolherbst: yeah well.. it isn't
17:30 karolherbst: what we do is, we have the "fence buffer" maped and just check if a counter increases ...
17:31 anholt: and, like, sure. if you want to avoid the trip to the kernel in the happy case, then also track the counter and skip if the counter's already past. but stuffing workarounds in the busy loop that you need to delete anyway is not the way to go.
17:32 karolherbst: well, the question is.. what bo should we wait on?
17:32 anholt: when you create a fence, you grab one of the BOs from the exec. The batchbuffer is traditional.
17:32 karolherbst: we don't have any code to actually do that correctly
17:34 karolherbst: the entire way we communicate with the kernel needs to get rewritten from scratch
17:34 karolherbst: the compiler anyway, heck even the driver
17:34 karolherbst: I just might push stupid workarounds and wait for zink or something new
17:35 karolherbst: and as we get a new UAPI, it's pointless to start before that
17:42 karolherbst: anholt: anyway.. the entire way how we do command submission sucks big time, but as it stands now, the bos are hidden and we can't access them inside mesa
17:42 karolherbst: so...
17:47 anholt: libdrm was such a bad idea. sigh.
17:47 karolherbst: it was
17:47 karolherbst: anyway. I just want to be pragmatic here and make the pain smaller for users with easy things, even if we have to rework/rewrite stuff later
17:49 anholt: wait, what are you saying about bos being hidden? I see struct nouveau_bo * all over.
17:49 karolherbst: we don't use a bo for command submission directly
17:50 karolherbst: we have a nouveau_pushbuf, and it has an internal list of bos
17:50 karolherbst: that lives inside libdrm
17:51 karolherbst: but implementing a simple nouveau_pushbuf_wait isn't easy as the current code moves to different things once we submit/run out of space/whatever
17:51 karolherbst: and then we could break nouveau-ddx. and.....
17:51 karolherbst: ahhhh
17:51 karolherbst: it's all so terrible
17:52 karolherbst: honestly.. I just want to move to zink and fix shit for real in the vulkan driver and just remove nv50/nvc0 until we get proper replacements
17:53 karolherbst: probably moving libdrm into mesa could be a viable solution to hack it up without breaking the ddx...
18:00 imirkin_: anholt: we do userspace spinning for sub-allocated bo fake fences
18:00 imirkin_: we do use nouveau_bo_wait to wait on the "whole" bo
18:00 anholt: but a gallium fence is spinning only?
18:01 imirkin_: depends
18:01 imirkin_: (at least iirc)
18:01 karolherbst: we don't if the work is already finished
18:01 anholt: the code I was reading there, I didn't see a proper sleep path, just loss.
18:02 karolherbst: yeah, it doesn't exist
18:02 imirkin_: anyways, the reason we don't bo_wait is sub-allocated bo'[s
18:02 imirkin_: iirc it calls sched_yield() which is ... somewhat fake
18:02 karolherbst: imirkin_: we should wait on the bo having the commands
18:02 karolherbst: or not?
18:02 imirkin_: ideally we'd be able to call an ioctl to "sleep" on a fence until an interrupt
18:02 imirkin_: but that'd require uapi
18:03 imirkin_: (the hw totally supports it, obviously)
18:03 imirkin_: (at least since ... nv11 or so)
18:03 anholt: imirkin: the solution I was talking about is you use a BO with your submit ioctls (make an extra one and stick it in your buffer list with no relocs to it, if you need), and then you can use that buffer to wait for a specific submit to finish.
18:04 anholt: like, new uapi is better, yes, but also you can manage without.
18:04 imirkin_: ah i see
18:04 imirkin_: i missed that (was just skimming logs)
18:04 karolherbst: anholt: the issue is, we don't submit explicitly
18:04 karolherbst: submits can literally happen at any time
18:05 karolherbst: although that might not be important in the cases we actually flush explicitly
18:05 anholt: you're saying that because of how the current libdrm interface works, right?
18:06 karolherbst: yeah, more or less
18:06 karolherbst: it's possible to make it not do that though, but.. it's annoying (I did some of them in my multithreading fixes MR)
18:07 karolherbst: anyway.. we also share a command buffer across all threads 🙃🙃🙃
18:07 karolherbst: and I'd like to fix that before actually doing major reworks, which are like required
18:07 anholt: right, multithreading mr needs to land.
18:08 karolherbst: so.. my plan is.. be pragmatic about fixes which really improve qol of users
18:08 karolherbst: do reworks later
18:10 imirkin_: the "fake-ish bo to track specific submit progress" idea is clever - i never considered it before
18:11 airlied: just import libdrm_nouveau into mesa already
18:11 airlied: like it's not going to be major rework to just airlift it in as-is, then you can wittle away at the insanity over time
18:11 karolherbst: yeah.. I guess
18:12 karolherbst: I'd still want to do it after fixing mt in a way which doesn't suck
18:12 karolherbst: I think I finally have an idea on how that all works out without being painful
18:14 imirkin_: that fake bo idea _seems_ like it should avoid the spinning, but i haven't thought it all through all the way
18:14 imirkin_: there could be some unfortunate deadlocks, not sure.
18:14 imirkin_: anyways, good luck :)
18:16 karolherbst: anholt: mhh, okay so your idea would be to just add another bo and wait on that.. yeah.. that only works after we move libdrm and make it so it never submits implicitly
18:17 karolherbst: it also requires doing submissions with multiple bos, so we don't have to split midway and stuf
18:17 karolherbst: f
18:17 karolherbst: I wish all of that would be way simplier, but...
18:18 karolherbst: I already spend too much time on fixing fencing + command submission in multithreading environments and there is a lot of nasty behavior going on, which is annoying
18:19 karolherbst: anyway.. I am convinced that my multithreading fixes are actually working as the android simulator doesn't crash with those :)