00:39airlied: okay lavapipe win32 has something on the screen, just need to fix the stridfe
00:42zmike: screenshot on reddit or it didn't happen
00:42imirkin: i thought screenshots on reddit were only of things that didn't happen...
00:49anholt: airlied: nice
00:51jenatali: airlied: Sweet!
00:52anholt: airlied: reassigned your lvp asan fixes to marge, once those go through I'll see what the rebased CI run looks like.
00:52anholt: really hoping we'll be able to run on zink soon
00:52anholt: *turn on
00:55airlied: yay thw vkcube is spinning
00:55airlied: now to clean up the mess
01:00zmike: airlied: are you going to put this on ci?
01:01airlied: zmike: might leave that to daniels :-P
01:02zmike: well I didn't mean "you" literally, we're a community
01:02zmike: if that's in the cards though, it'd be cool to see zink put in there too since we seem to have trouble keeping the windows build operational
01:05jenatali: Seems worth adding to the Windows CI, at least from a build perspective
01:06jenatali: Not sure how much effort it's worth writing tests for it though
01:07zmike: should just be able to run the same tests?
01:07zmike:should spin up a windows vm sometime for testing
01:08jenatali: Oh, piglit on zink on lavapipe? Sure
01:13airlied: jenatali: we also run cts on lvp
01:13jenatali: airlied: But not on Windows
01:13jenatali: Meaning the CTS isn't currently built for Windows
01:14airlied: yeah that would be the pain alright :-P
01:14jenatali: Yyyyyyep, can confirm :P
01:45airlied: zmike: not quite reddit, but I tweeted it :-P
01:45zmike:rushes to the twittersphere
01:46jenatali:counts down seconds til Phoronix article
01:48jekstrand: Uh, oh... What'd you do?
01:51jenatali: jekstrand: Lavapipe on Windows
01:57airlied: jekstrand79: just finished the lvp win32 port
01:57jekstrand79: Oh, neat
03:09airlied: wierd all cmd buffer allocs for sascha deoms are hitting loader icd magic assert, wonder what I've done wrong there
04:39airlied: jekstrand: you ever heard of cmd buffer loader magic corruption from the loader?
04:40airlied: has to do https://gitlab.freedesktop.org/airlied/mesa/-/commit/a42a942289e7c4a893faf96e8e8570049bba3f82 to make demos work
04:40airlied: and moltenvk has this https://github.com/KhronosGroup/MoltenVK/issues/689
04:50airlied: ah well the sascha demos working on win32 as well now with that hack
05:05jekstrand: airlied: Yeah, command buffers are dispatchable objects.
05:06jekstrand: airlied: But you should be deriving from vk_object_base and calling vk_object_base_init which should take care of that for you.
05:06jekstrand: airlied: Are you memsetting your entire lvp_command_buffer struct to reset it?
05:11airlied: jekstrand: it only happens with one that are handed back to the pool
05:11airlied: and they are corrupted when I get them back
05:13airlied:thought it was memory corruption and I'v no idea how to track that on windows, so it might still be
05:14airlied: jenatali: I've added it to building in CI in my branch
05:14jekstrand: airlied: FYI: You want to get rid of LVP_DEFINE_*HANDLE_CASTS and replace them with VK_DEFINE_*HANDLE_CASTS. It'll force you to make everything derive from vk_object_base. It'll also help you catch bugs like that, I expect.
05:15jekstrand: airlied: Among other things, the VK_ versions do a few object sanity checks as part of the cast so you'll see the corruption quicker.
05:15jenatali: airlied: Application Verifier would help track corruption on Windows
05:41airlied: jekstrand: so loader_set_dispatch in the loader writes data to obj over the loader magic
05:42airlied: at least it's not random memory corruption, I guess I just don't use a debug build of the loader on linux at all
08:20tzimmermann: mripard, hi! may i ask you for a review of https://email@example.com/
08:20tzimmermann: let me know if there's something i can review for you
08:35pinchartl: possibly not strictly on topic for this channel, but I know there's lots of experience here on this domain: is it possible to configure a CI/CD pipeline on gitlab.fd.o without a .gitlab-ci.yml file in the repository ? I'm considering a use case of upstream kernel development, which won't allow gitlab-specific files to be committed to the master branch, with CI pipelines
08:37hifi: not specific to fd.o gitlab but you can always create a repo that has the CI config which then in turn clones (with --depth 1 or something) the actual repository you want the code from
08:38hifi: that won't give you automatic CI runs on push directly, though
08:38pinchartl: I've just realized that the path to .gitlab-ci.yml can be an http URL
08:39pinchartl: sorry for the noise :-)
08:39hifi: well that makes it a lot easier
08:40mripard: tzimmermann: I'll give it a look today :)
08:41udovdh: Is there a howto on the net describing what I need to change in my mesa build when I switch to wayland? (if at all?)
08:42emersion: just make sure you have "wayland" in -Dplatforms
08:44udovdh: taht is all? no other requirements that I might need? It works but I ask just to be sure
08:57udovdh: Currently the conf is like: meson configure . --prefix /opt/xorg/ -Dbuildtype=release -Ddri-drivers= -Dgallium-drivers=radeonsi -Dgallium-xvmc=auto -Dgallium-vdpau=true -Dgles1=false -Dgles2=true -Dgallium-xa=auto -Dgallium-opencl=disabled -Ddri3=true -Dplatforms=auto,x11,wayland -Dshared-glapi=true -Dglx=dri -Dglx-direct=true -Dgbm=true -Dosmesa=false -Dglvnd=true -Dlmsensors=true -Dvulkan-drivers=amd -Dgallium-va=true
08:58tzimmermann: thanks mripard
10:14MrCooper: kiryl: FWIW, at least in theory your issue shouldn't happen with a recent AMD GPU
12:36mslusarz: hey guys, I'm trying to figure out why X freezes when application triggers kernel's gpu hang detection and I discovered that X isn't affected directly by the hang - it freezes on first glXSwapBuffers after mesa recreates its GL context
12:36mslusarz: it seems it's not the implicit glFlush in glXSwapBuffers that is causing the problem, but the swap part - from mesa perspective freeze happens on xcb_flush after xcb_present_pixmap
12:36mslusarz: this is on Intel gen9 GPU, mesa 20.3.4, xserver 1.20.10 with modesetting driver and my own fork of mesa on the application side (glretrace in this case)
12:38mslusarz: it seems X doesn't notice there's anything wrong and acceleration of new applications keeps working, just X can't display new content
12:40mslusarz: I would appreciate if you would give me some advices how to proceed
12:43mslusarz: by "X freezes" I mean it switches between (last?) 2 frames back and forth
13:48MrCooper: mslusarz: it's likely because glamor doesn't support https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_robustness.txt yet (will be pretty tricky to add though), so all its GL drawing commands are dropped on the floor after the GPU recovery
13:50mslusarz: MrCooper: if I exit() on Mesa side anywhere between failed ioctl and glxSwapBuffers X server doesn't freeze
13:53mslusarz: so I think X is not affected by what happens with application context
13:54mslusarz: something about the drawable passed to glXSwapBuffers is not right, but I have no idea how to debug this...
13:56mslusarz: uhm, actually, let me read the extension text, I thought it's about something else
14:11danvet: vsyrjala, did you respin your vblank_restore fix and I missed it?
14:20zmike: MrCooper: re: that piglit expects thing, anything I can do to help move that along?
14:21MrCooper: come up with a solution for the issue I described :)
14:21zmike: I'm not sure I fully understand the issue tbh
14:21zmike: haven't gotten very deep into understanding our ci pipeline yet
14:25MrCooper: the issue is that ci-expects/ is in the rules of test jobs only, not of the container/build jobs they depend on
14:25vsyrjala: danvet: not yet. i wanted to test it a bit first, but psr1+hsw totally borked so actually can't test
14:25vsyrjala: i guess i'll just send it out anyway then
14:25MrCooper: so if nothing else triggers those container/build jobs, there's no pipeline due to invalid YAML
14:26danvet: MrCooper, hm I thought iris would recover internally
14:27danvet: and arb_robustness is only for when you want the driver to not do that because you're dealing with untrusted shaders and stuff like that
14:27danvet: i.e. webgl
14:27danvet: Kayden, ^^ or am I wrong here
14:27danvet: vk ofc just keels over and drops vk_device_lost on the application/compositor
14:27MrCooper: maybe it does recover internally, but somehow "loses the connection" for buffers shared via DRI3/dma-buf?
14:28danvet: mslusarz, ^^
14:28danvet: MrCooper, yeah that's maybe more plausible
14:28danvet: maybe the imports all fall to pieces in the new context
14:28danvet: but shouldn't happen either
14:29danvet: mslusarz, #intel-3d might help with this, if it's intel specific
14:29danvet: mslusarz, hm, what's your renderer string? I think iris isn't yet the default for gen9
14:33yshui`: is Xserver expected to keep functioning after GPU resets?
14:34yshui`: it's not the case on AMD gpus
14:34zmike: depends on the driver
14:36yshui`: the kernel driver? the gl implementation? or ddx?
14:37mslusarz: danvet: iris is the default driver on gen9
14:38mslusarz: danvet: I don't know if this issue is Intel specific
14:39karolherbst: ufff.. the cryptocurrency bullshit is in such a state, the GPU vendors produce "mining cards" so that gamers can have their GPUs for normal prices :D
14:41yshui`: danvet doesn't seems to be intel specific
14:44MrCooper: with amdgpu it depends on what kind of reset is required; in theory it's also possible for radeonsi to recover internally as seems to be the case for iris, but in practice most of the time it's not, in which case both the app and the display server (and glamor in Xwayland) would need to support the robustness functionality to be able to recover without restarting the display server
14:45MrCooper: on the bright side, with GNOME 40 one will be able to kill Xwayland, and mutter will just restart it on demand :)
14:46yshui`: I see.
14:46mslusarz: the thing is that if I modify iris to not attempt to recover and just abort, X survives just fine, with new GL apps unaffected
14:46yshui`: Does wayland compositors support this kind of recovery in general?
14:47emersion: wlroots has been supporting this for a long time
14:47emersion: note, killign xwayland also kills all x11 clients
14:48MrCooper: emersion: what exactly is "this"? :) I suspect yshui` might have asked about recovering from GPU hangs via robustness
14:48emersion: ah, no, xwaylanmd restarts
14:49emersion: robustness is still a TODO for everyone AFAIK
14:50yshui`: maybe in mslusarz 's case, iris recovery actually broke X's GL context?
14:53mslusarz: yshui`: recovery is on the client side, so I'm not sure how can it affect the server side...
14:54yshui`: the server side has a GL context that needs to be recovered as well.
14:58mslusarz: yshui`: so why X survives if application just aborts on hang?
14:59yshui`: i would like to know as well
14:59mslusarz: (I'll verify if X even gets the hang notification in a moment)
15:06mslusarz: nope, it doesn't
15:08yshui`: so recovering a client side context breaks the server side context somehow?
15:09pq: Does the X server actually keep on flipping but not rendering, or does it stop flipping?
15:09pq: if X server stops flipping, maybe the crashed app context managed to submit a fence to X server, and that fence never signals as the app context got destroyed/reset?
15:10mslusarz: yshui`: I'm not sure, but I think the buffer shared between the X server and application somehow gets "broken"
15:10pq: ...reset but not destroyed, maybe? and aborting the app causes the context to be fully destroyed
15:10mslusarz: pq: how can I verify if it stops flipping?
15:11pq: mslusarz, uhh... gdb and breakpoints?
15:11mslusarz: but on what, I'm not familiar with X server code
15:11pq: drmModePageflip and drmModeSetCrtc
15:11pq: me neither
15:12yshui`: from my experience with AMDGPU, usually after gpu resets, X would flash between 2 broken frames
15:12yshui`: is that the same with intel? mslusarz
15:13mslusarz: yshui`: yup, 2 frames keeps swapping
15:13yshui`: that would mean it's still flipping?
15:32mslusarz: pq: yeah, X seems to keep flipping
15:32pq: that shot in the dark missed then :-)
15:33danvet: tzimmermann, we might also need to fix up dma-buf importing to make sure it uses the right struct device
15:33danvet: otherwise the dma-buf is on the wrong device
15:33yshui`: mslusarz what did you do to determine if X has gotten the reset notification?
15:33mslusarz: (I had to break on callers of drmModePageFlip inside of the X server, instead of drmModePageFlip itself, because gdb would just hang otherwise, weird)
15:35mslusarz: yshui`: I put a breakpoint on code that handles EIO from command submission ioctl
15:36ajax: hm. the -Dglx=xlib build target stopped compiling recently. can i use that as an excuse to just delete it (pretty please)?
15:45yshui`: mslusarz looks like iris only tries to recover if it got an EIO. but there's also the "innocent" reset, and I don't think iris automatically recover in that case?
15:46yshui`: maybe Xserver still needs to recover itself in that case, but doesn't. and when the client side doesn't try to recover, the server side keeps working by accident?
15:48mslusarz: that would be sad...
15:49mslusarz: I want my software to crash and burn on any failure, not work by accident ;)
15:49ajax: i'm not entirely sure xserver can meaningfully recover on reset
15:49yshui`: that's just a hypothesis
15:50ajax: well, it could, if X had a way to send clients events about losing pixmap contents, and your client happened to know how to handle it.
15:51yshui`: ajax: isn't that what Expose events are for?
15:51ajax: but for every existing client, we'd have lost all of their pixmaps' contents
15:51ajax: those are events about Windows
15:51ajax: you could name a pixmap instead of a window in the event, i suppose, but literally no client anywhere is expecting you to do that yet
15:52yshui`: hmm, but i suppose having a few glitch windows is better than the whole X server stops working?
15:53mslusarz: yshui`: +1
15:55mslusarz: how buffers are shared between X server and client? who is the owner?
15:57imirkin: X server owns everything GPU-related
16:00yshui`: mslusarz: I think it's DRI3PixmapFromBuffer, which creates an X pixmap from some sort of file descriptor
16:00yshui`: and then you can Present with that pixmap
16:03yshui`: imirkin i thought the buffers are indeed shared, so no one side really owns them?
16:04imirkin: with present, under certain conditions, the X server may indeed decide to flip to the given image
16:04imirkin: however all (regular) pixmaps/etc are hosted by the X server
16:28MrCooper: the point here is rather the pixmap storage is shared between server & client via DRI3; Present works with "normal" pixmaps (whose storage only exists in the server) as well
16:29imirkin: right, yes.
16:31MrCooper: mslusarz: you wrote "acceleration of new applications keeps working, just X can't display new content"; it's not clear if that means new applications are displayed correctly or not
16:36mslusarz: MrCooper: they are not displayed; I haven't verified it thoroughly, but they seem to work correctly, e.g. piglit shader test that probes some pixels succeeds
16:37imirkin: mslusarz: a piglit shader test that does front-buffer rendering/verification?
16:37MrCooper: that does sound like the client's recovery affects the X server's context, i.e. a kernel i915 driver issue
16:37imirkin: if it's back-buffer (/fbo), then it's all inside the client
16:38mslusarz: imirkin: oh... I used -fbo -auto
16:38imirkin: yeah, so if you have DRI3, then there's (nearly) zero X server involvement
16:38MrCooper: -fbo doesn't display anything on the display server
16:39MrCooper: try something like glxgears?
16:39mslusarz: how can I check that? without -fbo -auto shader_runner doesn't say whether probe succeeded
16:40MrCooper: you can use -auto without -fbo
16:40MrCooper: though normally the result should be reported even without -auto
16:40mslusarz: glxgears works (it prints that it renders 60FPS every few seconds), but I don't know what's the contents
16:41MrCooper: anyway, no permutation of piglit test options can verify that the contents are actually displayed correctly
16:41imirkin: if you use -auto, then it might still not care
16:41imirkin: you have to pick the right test
16:41imirkin: which uses front-buffer rendering / verification
16:42imirkin: look for *front* (but not front-facing :) )
16:42MrCooper: the front buffer is just another DRI3 buffer, not what's actually displayed
16:49mslusarz: imirkin: https://gitlab.freedesktop.org/mesa/piglit/-/blob/master/tests/general/read-front.c ?
16:51imirkin: mslusarz: yeah. but with DRI3, MrCooper is probably right, and there's just no way to test it with a GL application
16:52MrCooper: you'd have to get the window contents via another X mechanism
16:52imirkin: right. i meant "via existing piglit tests" :)
16:53imirkin: i guess even with DRI2 you can't. nvidia did something different, and that's why we show windows even with -auto now, but it didn't matter for mesa / open stack
16:54mslusarz: uhm, interesting... read-front -auto succeeds, but without -auto it says observed != expected
16:55imirkin: mslusarz: what if you do PIGLIT_NO_WINDOW=0 ? (iirc 1 is still the default)
16:55imirkin: (for -auto)
16:56mslusarz: fails with both 0 and 1
16:57imirkin: you said -auto succeeds
16:58imirkin: you're trying to fix the regular case. i'm trying to break -auto ;)
16:58mslusarz: oh, sorry, I misunderstood
16:59mslusarz: with -auto both 0 and 1 suceeds
17:00imirkin: huh ok
17:00imirkin: that's odd. don't know what -auto would be doing differently then
17:00imirkin: i thought it might be skipping the window creation which somehow affects things
17:00imirkin: but apparently that doesn't end up mattering
17:11yshui`: MrCooper use Composite to grab the window pixmap?
17:12MrCooper: or even just XGetImage :)
17:14mslusarz: I have to go for now, I'll be back tomorrow
17:29yshui`: A client recreating its context isn't normally going to break the server, so I think the GPU being reset has to have something to do with this.
17:38alyssa:tries to understand how coordinate shaders are legal
17:38alyssa: (in the presence of side effects)
17:43alyssa: "You may not assume that a Vertex Shader will be executed only once for every vertex you pass it. It may be executed multiple times for the same vertex" oh
17:54HdkR: alyssa: Alternative story. Tons of applications don't think the vertex shader will execute twice and result in fun broken behaviour for them :)
17:54HdkR: If you're lucky they are just storing to memory based off vertid and te resulting double execution doesn't hurt much
17:54imirkin: alyssa: HdkR: alt-alternative story: many drivers don't support images/ssbo's in vertex shaders, so applications don't rely on them
17:55alyssa: imirkin: fair :p
17:55alyssa: actually thinking about geom shaders which the spec is even less clear about
17:56imirkin: geom runs once per primitive
17:56imirkin: but even there, it could be processed multiple times on tilers/etc
17:56alyssa: ^^ yeah
17:56imirkin: such is life.
17:56imirkin: which is why specs only require this stuff for frag / compute
17:57alyssa: (trying to figure out if it's legal to run the geom twice to figure out the sizes of everything ahead-of-time)
17:57imirkin: (maybe that's not *why*, but it's a nice side-effect)
18:05MrCooper: yshui`: a normal user-space process shouldn't be able to explicitly trigger a GPU reset which affects other processes
18:06MrCooper: at least it should have to work harder :)
18:07yshui`: MrCooper aren't we a pretty long way from being able to prevent that?
18:08alyssa: macOS can't even do that...
18:08yshui`: if the application submits a broken program, that could easily hang the gpu
18:08imirkin: just solve ATM - what's the big deal
18:08MrCooper: hence my clarification :) obviously it's always possible indirectly by making the GPU hang, but it still shouldn't be possible by just calling a "break other contexts please, kthxbye" ioctl
18:14yshui`: so it's about keeping buffers and stuff alive across reset?
18:18anholt: pinchartl: iirc you can also say "always use this branch of mine for the gitlab-ci.yml" in your project's ci config
18:22pinchartl: anholt: I've seen that the CI config can also point to a separate project
18:22yshui`: imirkin: what's ATM, the Turing Machine?
18:23pinchartl: so it should support all I need
18:24imirkin: yshui`: the halting problem
18:24imirkin: i.e. whether a turing machine will hit a halting state or not
18:25yshui`: imirkin: i wonder if it makes sense to create a total shader language
18:25imirkin: (which is not solvable using current methods in finite time)
18:25imirkin: (i forget if it's proven to be unsolvable on a turing machine or not)
18:26yshui`: yeah the halting problem is undecidable
18:32alyssa: yshui`: IIRC unextended gles2 is close
18:45yshui`: looks like gles2 shader still has unrestricted loops?
18:47HdkR: gles2 it is valid for a shader to fail compiling if the loop can't be fully unrolled
18:47ajax: the answer to the halting problem is yes. eventually entropy will bring the computer to a halt.
18:48alyssa: the answer to "doctor, is it terminal?" is necessarily always yes, including for the common cold
18:49yshui`: @ajax to be fair practical computers don't normally have infinite storage anyway
18:50imirkin: alyssa: in the end, that di-hydrogen monoxide will get you
18:50HdkR:hands imirkin a dad-joke sticker
18:51alyssa: DHMO is toxic at levels found in the Atlantic Ocean, putting millions of Maritimers at risk!
18:51ajax: thousands die of accidental inhalation of dhmo every year and it's found in 100% of malignant tumors
18:52ajax: ban this etc
18:52FLHerne: HdkR: That may be true, but when I screwed up and put an infinite loop in an ES2 shader it hung my GPU under radeonsi
18:52imirkin: in case others are concerned: https://www.dhmo.org/facts.html
18:52alyssa: FLHerne: MAY not MUST
18:52FLHerne: Or maybe it was intel then, I can't remember which laptop that was :p
18:53HdkR: I'd expect any GPU that supports real branching will just infinite loop and force a job kill or GPU reset
18:53karolherbst: HdkR: sure? :D
18:54HdkR: karolherbst: Plez support job killing :<
18:54karolherbst: HdkR: I'd if the hw would let me
18:54imirkin: heh, that site also has a link to the klein bottle guy
18:54FLHerne: Ok, but it means restricting to ES2 is only a hypothetical solution, not something that'll help with existing drivers
18:55FLHerne: And now I wonder how WebGL doesn't blow up more often than it does
18:55karolherbst: FLHerne: because browser wrap shit
18:55karolherbst: they don't pass in the shader without messing around with it :p
18:55FLHerne: I mean, there was that glfuzz thing that would hang my laptop with webgl, but that seems to be the exception
18:55imirkin: FLHerne: don't need infinite loops. drawing a single primitive can hang an nvidia gpu (with blob drivers too)
18:56karolherbst: heh :D
18:56karolherbst: how'd you do that?
18:56imirkin: deqp did that...
18:56karolherbst: just stupid shader with an infinite loop?
18:56imirkin: trivial shaders
18:56imirkin: max out tessellation, max out geometry outputs, watch the fireworks from drawing a patch
18:57imirkin: i suspect it has some kind of internal buffer which doesn't handle the million+ triangles that end up getting generated
18:57karolherbst: probably :D
19:04alyssa: geom/tess is awful
19:05imirkin: esp when you do 1024 vertices output, 32 instances, and tess factors at 64 :)
19:06imirkin: one patch can do a lot of damage :)
19:14HdkR: karolherbst: There should be a way to kick a "stuck" job other than a GPU reset right?
19:14karolherbst: HdkR: should as in "they should add one" or as in "there probably is one?"
19:15HdkR: There probably is one
19:15karolherbst: without turning one debugging?
19:15karolherbst: and for graphics?
19:15karolherbst: I am sure all of that is "trivial" for the compute engine, but graphics?
19:16HdkR: I believe so. Might be a generational thing
19:17karolherbst: ohh wait
19:17karolherbst: with "GPU reset" you mean a full GPU reset, right?
19:18karolherbst: ahh yeah.. nouveau doesn't support that :p
19:18karolherbst: you can kill the channel and restart the falcons
19:18karolherbst: and that's how we recover from stuck jobs
19:19HdkR: Sounds reasonable
19:37ajax: anholt: does building with asan set a #define that we could conditionalize the dlclose on?
19:37anholt: ajax: yeah, there's one that we do in egl
19:38anholt: but for vk, the loader does the dlstuff 
19:38ajax: alternatively, would you accept an LD_PRELOAD that nerfs dlclose for this particular test set?
19:39anholt: would be pretty into that
20:08ajax: airlied: i know you were working on scene overlap for llvmpipe at one point, did you ever get to letting the PutImage overlap too?
20:16airlied: ajax: no, never got to it, was messy to navigate the glx/dri code
20:16airlied: https://gitlab.freedesktop.org/airlied/mesa/-/tree/llvmpipe-wip-scenes is an updated overlap branch that works for vulkan as well
20:16airlied: I was considering trying to make it works for the vulkan wsi where at least I don't have to deal with the glx/dri interface
20:17airlied: fencing the xshmputimage was where I think I was getting stuck
20:17airlied: we have a pretty large Xsync in there now
20:54alyssa: dschuermann: What's the difference between opt_sink and opt_move?
20:57dschuermann: global vs local
20:57alyssa: got it, thanks
20:57alyssa: so I want both?
20:57dschuermann: yes, first sink, then move
20:57alyssa: (but mostly want sink, since local we can do in the backend sched)
20:57alyssa: [well. will be able to anyway]
20:58dschuermann: makes sense
21:00alyssa: sink fixes spilling on a pathological case hit in deqp, thanks :+1:
21:01alyssa: (A shader that writes out a constant vector a bunch of times, which without sink meant huge numbers of moves to constants at the start)
21:28alyssa: I will say dEQP-GLES31.functional.ssbo.* is awfully slow for me :(
21:28alyssa: I guess it does encompass 2000 tests :)
21:29imirkin: is each tests slow, or just lots of tests?
21:29anholt: I would bet that nir_load_store_vectorize will help you.
21:29alyssa: a mix, I think most our fast and I just have some dumb cases to deal with for RA :)
21:29imirkin: there are *some* tests in there which iirc are slow
21:29imirkin: esp if you run * rather than grepping in the master.txt file
23:29anholt: apinheiro: things like cmdcopybuffer are really unfortunate for v3d which has switching between compute and render actually require going back to the kernel.
23:30anholt: but unless you've done something clever, you've already got separate bin and render jobs for each
23:31anholt: apinheiro: I wouldn't expect each cmdcopybuffer to be a heavyweight operation (multiple trips to the kernel)
23:32apinheiro: anholt, no, each cmdcopybuffer is not really heavyweight
23:32apinheiro: the main issue here is an oom, as the 15k copies gets accumulated
23:32apinheiro: > but unless you've done something clever, you've already got separate bin and render jobs for each
23:32apinheiro: well, it is now when we are evaluating to do clever things or not
23:32anholt: doesn't each of the cmdcopybuffers end up with a separate V3DV_JOB_TYPE_GPU_CL?
23:33apinheiro: until the end of last year we were more focused on get things done
23:33apinheiro: > doesn't each of the cmdcopybuffers end up with a separate V3DV_JOB_TYPE_GPU_CL?
23:33apinheiro: as it is working now
23:34apinheiro: for each copybuffer, we create all the render/binnnig/etc that it is part of the v3dv_job_type_gpu_cl
23:34anholt: so each of those is a job ioctl to the kernel, and a bin and render interrupts. plus they've probably all got separate CL bos?
23:34apinheiro: and it is being included on the current cmd buffer
23:34anholt: I wonder how much you'd get out of sharing CL bos between jobs
23:35apinheiro: yes, that would be an interesting thing
23:35apinheiro: but as mentioned, for this specific app
23:35apinheiro: it uses 15k copies on the same cmd buffer
23:35apinheiro: and we get out of memory at ~2.6k
23:36apinheiro: so Im personally not sure if sharing CL bos between jobs would cover that gap
23:36airlied: so v3d has to record ioctls into the command buffers for replays, ouch
23:36anholt: 5x reduction is not totally implausible to me from starting the new CL's commands at the end of the last CL job's buffers.
23:36anholt: airlied: yep. actually pretty required by the different compute vs graphics queues.
23:37anholt: I think I had some idea at some point of being able to chain bin->bin->bin and render->render->render to avoid an ioctl per subsequent graphics job.
23:38airlied: anholt: is that an artifact of bad hw design or bad kernel api design?
23:39apinheiro: airlied, what do you mean on "for replays"? in this context is the same that for execute?
23:39anholt: we don't have a top level queue in front of the binner, renderer, and csd. there's just "stuff a new bin job in the registers"
23:39apinheiro: > 5x reduction is not totally implausible to me from starting the new CL's commands at the end of the last CL job's buffers.
23:40apinheiro: anholt, 5x reduction would be still short here ;)
23:40apinheiro: although clearly could help in other cases
23:40anholt: 5.8x, whatever :)
23:40apinheiro: and after all, I guess that using less memory would be good in general
23:40apinheiro: fwiw, im right now on analysis mode
23:41apinheiro: we were not expecting an app using 15k copies on the same cmd buffer, so we are not really sure what apps/games really do
23:42apinheiro: out of curiosity I was checking other drivers, and then check how apps use the copies (copytobuffer, copy to image)
23:42apinheiro: and see how much priority would have start to code stuff like this
23:43apinheiro: anholt, but thanks a lot of the advice
23:44apinheiro: btw, "checking apps" here is mostly the quake vulkan ports, ue4 demo and now the doom3 vulkan port
23:44apinheiro: that are basically the most complex vulkan apps that we found and are able to run
23:44anholt: apinheiro: at some point, for vulkan-in-general, probably going to need to tackle a multi-submit interface in the kernel where bin is in hardware before render is queued, but not wait for bin finished, and use the CL semaphores to have the HW schedule between them
23:45anholt: and also see about loading more than one job in CTnQ at a time
23:49anholt: apinheiro: reason I've been thinking about those last couple things is that apparently it's been important https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371
23:49anholt: (I think we did this for tu as well)
23:50apinheiro: hmm interesting
23:54apinheiro: anholt, thanks for all the tips. I will keep looking at how those apps use the copies, and tomorrow will talk with Iago
23:56airlied: anholt: would you be fast just executing memcpys instead of using the gpu :-P
23:57anholt: airlied: I had seriously considered whether there should be a job type that is "use the generic dma engine"