00:00 vincenttc: I did that in one test with the same result iirc
00:00 imirkin: ok, well you definitely need it no matter what
00:00 imirkin: the client has to submit commands to the gpu
00:00 imirkin: without that, no drawing, and the "other" side of the dma-buf can't possibly know that's what's happening
00:01 vincenttc: either way, it doesn't appear to be enough
00:02 imirkin: btw, i assume you're using either 32x32 or 64x64 DRM_FORMAT_ARGB8888 images?
00:03 vincenttc: yes
00:03 imirkin: from everything i see, that should give you a "live" view of those images
00:04 karolherbst: HdkR: 32kx32k?
00:04 vincenttc: hmm, strange
00:16 imirkin: vincenttc: to double-check, the dma-buf is generated on the same gpu as it's consumed?
00:17 vincenttc: imirkin: in this case no dma-bufs are involved, I'm just rendering to the same bo as is attached
00:17 imirkin: oh ok
00:19 HdkR: karolherbst: On pascal yea
00:19 imirkin: can you share some code? i may be able to have a look
00:20 imirkin: karolherbst: ok ... so ... the plan is to keep developing my test but artificially limit it to 8k. given that > 8k doesn't seem to work either way, same principles should apply.
00:20 imirkin: karolherbst: unfortunately we do have to get this little tidbit figured out, since 16k is required for GL4
00:22 karolherbst: mhh, I see
00:22 vincenttc: imirkin: the code itself might not be all that readable since it is behind some abstraction but this is the relevant part: https://github.com/swaywm/wlroots/blob/master/backend/drm/drm.c#L696-L711
00:22 vincenttc: plane->surf is created with gbm_surface_create iirc
00:23 vincenttc: crtc_set_cursor calls drmModeSetCursor
00:24 vincenttc: the rendering functions end up in here https://github.com/swaywm/wlroots/blob/master/render/gles2/renderer.c
00:42 imirkin: skeggsb: looks like GP104_COPY class has rev'd where the x/y/z offsets go
00:43 imirkin: srcx/y go into 0x744/748
00:43 imirkin: i can make guesses about the rest :)
00:43 imirkin: should try to devise a test that fails on pascal but not earlier
00:45 imirkin: (this is with c1b5, but i bet c0b5 also has this)
00:45 imirkin: according to HdkR textures can be a lot wider, which would explain the change
00:46 HdkR: The widest you could ever want
00:46 imirkin: i wonder if it maintained compat with the old stuff, which is why we haven't noticed
00:46 imirkin: HdkR: don't underestimate my imagination :p
00:47 HdkR: I think it is 2m width now?
00:47 HdkR: Would need to double check
00:47 imirkin: pfft
00:47 imirkin: i can think of bigger numbers.
00:51 HdkR: workgroup width also went to the moon in compute
00:52 karolherbst: HdkR: again? :/
00:52 HdkR: 2147480000 x 64k x 64k
00:53 karolherbst: that we already have
00:53 HdkR: Ah, alright :)
00:53 karolherbst: we set it to 2147483647 though
00:53 karolherbst: since kepler
00:53 imirkin: since kepler, no?
03:34 imirkin: wow, i think i found the bug
03:34 imirkin: dumbest. bug. ever.
03:41 imirkin: at least logic persists in the universe.
11:47 karolherbst: imirkin: duh! ...
11:55 karolherbst: imirkin: but we have to do increase that to 32768 for gens which support it?
15:43 karolherbst: imirkin: do you think it might be a viable thing to lock all screen operations in order to prevent mt issues in a case where each context already got it's own pushbuffer? I think it only has to cover the functions setup by nvc0_screen_init_resource_functions
15:44 karolherbst: and then maybe not even all...
15:58 karolherbst: also, there are nice "dEQP-EGL.functional.sharing.gles2.multithread.*" tests :)
16:00 karolherbst: ups... yeah, they cause a hang realy fast on mesa-master
16:11 karolherbst: okay... got another bo use-after-free corruption
17:56 karolherbst: ohh, actually this is a regression from my patches
18:34 imirkin: karolherbst: i think i just want to kill the "eng3d" path with fire... it's too hard to make it work for MS up-sample and downsample properly, esp given the various games we play with disabling MS and so on
18:46 karolherbst: yeah. I highly doubt there are any downsides using u_blitter either. Or does it have a higher overhead?
18:47 imirkin: well, the downside is that it doesn't work
18:47 imirkin: and has to be modified to work with nouveau
18:47 imirkin: other than that it's great :)
18:47 karolherbst: okay sure, but we have to fix our eng3d path also from time to time
18:48 imirkin: the issue is that we (a) have a shitty resolve and (b) have pretty buggy handling of MS in general, in that path
18:48 imirkin: so ... even if it's "faster", i think dumping it is the right move
18:48 imirkin: most of the "fast" cases are handled by the 2d path anyways
18:49 imirkin: i haven't looked in great detail, but i think the u_blitter changes to make it work with nouveau's lack of stencil export should be largely achievable
18:53 karolherbst: yeah, otherwise u_blitter wouldn't be that great if we couldn't teach it that :p
19:17 imirkin: gm200+ has stencil export btw
19:17 imirkin: but that won't help me!
23:00 vincenttc: imirkin: so, after quite a long search I've managed to find out why the dissapearing cursor problem that I was having occured. When a gbm bo is created with GBM_BO_USE_LINEAR it is initially placed in the gart domain. It is pinned to vram when drmModeSetCursor is called, the rendering operations are presumably still directed at the old location of the bo.
23:00 imirkin: hmmm
23:00 imirkin: would have see how the pinning works
23:01 imirkin: in principle it shouldn't matter if it's in gart or vram, the VA remains the same
23:08 vincenttc: so, should I report this on the bugtracker?
23:10 imirkin: you could
23:11 imirkin: chances of it getting fixed by someone other than yourself are limited though
23:11 imirkin: since it's a pretty esoteric use-case
23:18 vincenttc: mmh, fair enough
23:18 imirkin: happy to answer any questions though
23:18 vincenttc: if I have some spair time in the future I might try it
23:18 imirkin: and given how far you've tracked it down already, i'm sure you'll be capable of working out the solution
23:18 vincenttc: but since glFinish "fixes it, it won't be that high of a priority for m either
23:18 imirkin: hmmm
23:19 imirkin: i don't have a good explanation for that
23:19 imirkin: glFinish is just glFlush + wait for fence completion
23:19 vincenttc: it probably just allows the render operations to finish before the buffer is moved from gart to vram
23:20 imirkin: let me quickly see hwo drmModeSetCursor works exactly
23:25 imirkin: skeggsb: looks like we never define a cursor_set function?
23:26 imirkin: vincenttc: do you happen to get an error when you use drmModeSetCursor?
23:26 imirkin: oh wait, nm.
23:26 imirkin: we have crtc->cursor set.
23:27 skeggsb: imirkin: it's not set because that's deprecated, cursor plane is used instead
23:27 imirkin: yeah
23:28 vincenttc: imirkin: the move happens in nv50_wndw_prepare_fb when it calls nouveau_bo_pin
23:29 imirkin: vincenttc: hmmmm i wonder if the literal MOVE doesn't wait for rendering to complete
23:29 imirkin: that'd make sense
23:29 imirkin: skeggsb: do we wait for pending fences before scheduling a move?
23:29 imirkin: i assume not
23:30 skeggsb: pretty sure that happens somewhere, whether it be in the driver or in ttm itself
23:31 imirkin: skeggsb: unrelated - did you see i have a reproducible CTXSW_TIMEOUT with a simple-ish dEQP test on a GK208B?
23:31 skeggsb: yeah, i vaguely recall seeing that go past
23:32 imirkin: any thoguhts? :)
23:33 skeggsb: i'd be very surprised if it were related to gr init like the stream master thing you mentioned, because i got us *identical* to RM there when i did volta bring-up
23:33 skeggsb: (except for some sm mapping stuff on generations maxwell 2 and up)
23:33 imirkin: skeggsb: ok
23:33 skeggsb: though, i also don't have a gk208b, so, it's feasible there could be something there too
23:33 imirkin: fwiw i gave it a once-over again
23:34 imirkin: i did only look at a GK208 boot log
23:34 imirkin: not GK208B
23:34 imirkin: [and didn't find any discrepancies]
23:34 skeggsb: i usually hack the driver to log all reg writes for gr init, and diff it against a mmiotrace of RM
23:35 imirkin: i'll try to generate one with the GK208B and see if there's something funky. ideally someone else on a different kepler can chekc if it happens there too
23:35 imirkin: or if i'm just extremely lucky
23:35 skeggsb: "lucky" ;)
23:36 imirkin: you know how it goes
23:36 imirkin: i picked a board to plug in that i can actually run my new monitors with...
23:36 imirkin: and it has this issue =/
23:36 skeggsb: i've got a shiny new monitor now too actually, but not time yet to play with it
23:37 imirkin: and i had to fix a bunch of rotation-related bits. o well. all good now.
23:37 imirkin: trying to get my hands on a K4000 so that i can do everything all at the same time
23:47 karolherbst: skeggsb: ohh, maybe you know this: any idea why sticking vertex buffer into VRAM can cause the channel to be killed?
23:47 karolherbst: and it's working fine if we stick them into system mem?
23:48 karolherbst: I have like 3 or 4 games which lead to channel to be killed, and sticking all vertex buffers into sysmem apperantly fixes that
23:51 airlied: coherency issues? or different response to out of bounds access?
23:52 karolherbst: maybe? but we get a CTXSW_TIMEOUT
23:52 karolherbst: that's why I am wondering about that issue
23:53 karolherbst: it's not like I get a trap or some other warp error or something
23:54 RSpliet: Or you're missing an error in a place where you (and the interrupt handler) doesn't expect it, which doesn't get cleared.
23:55 karolherbst: I would expect to get a different error message about that though
23:55 RSpliet: No, nouveau would miss it
23:55 RSpliet: And when FECS/GPCCS try to pause the engines for a context switch, they time out
23:55 karolherbst: sure?
23:55 karolherbst: there is that "nvkm_error(subdev, "INTR %08x\n", stat);" message for unhandled interrupts on the fifo
23:56 RSpliet: CTXSW_TIMEOUT is exactly that... well, 99.9% of the times it means "I tried to idle the engines, but it didn't happen in 200ms" or whatever the timeout it set to
23:56 karolherbst: I'd agree if that would be our firmware
23:56 karolherbst: but because it's nvidias... I have no idea what CTXSW_TIMEOUT exactly means
23:56 karolherbst: could be anything
23:57 RSpliet: It's the only reason why it could time out.
23:57 RSpliet: Idle engines, swap context, resume. That's all a context switch does
23:57 karolherbst: maybe nvidias firmware doesn't like something and just bails out for no reason at all
23:57 karolherbst: how should we know?
23:57 RSpliet: bailing != timeout
23:58 karolherbst: and who controls what error is sent to the driver?
23:58 RSpliet: You could RE the firmware if you want to be 100% sure, but... well, you can only spend your time once
23:58 karolherbst: I mean, I could check if we miss an interrupt or something, but it looks like we always handle unknown ones as well
23:59 karolherbst: maybe using nouveau.trace sheds some light? dunno