00:00fdobridge: <gfxstrand> Why "domain always valid VM"? What are you thinking are the exact semantics of the bit?
00:01fdobridge: <airlied> I just took amd's name, but we should probably come up with a better one
00:01fdobridge: <airlied> domain is was the nicest place to put a flag
00:01fdobridge: <airlied> I don't have a spare flags fields to stick things into here
00:02fdobridge: <gfxstrand> Yeah, domain seems reasonable. Doesn't seem any more crazy to go in there than mappable. 🤷🏻♀️
00:02fdobridge: <airlied> yeah the tile flags seemed a worse place
00:03fdobridge: <airlied> we could just call it USE_VM_RESV
00:04fdobridge: <gfxstrand> `NON_SHAREABLE`? I kinda hate that, though.
00:04fdobridge: <airlied> it's do we name it after what it does, or what it implies
00:05fdobridge: <gfxstrand> All the words I'm coming up with like "local" seem like they could mean "stays on device" which isn't what we want.
00:05fdobridge: <gfxstrand> I kinda don't want the name to imply implementation.
00:06fdobridge: <gfxstrand> But IDK. 🤷🏻♀️
00:07fdobridge: <gfxstrand> Naming things is hard
00:07fdobridge: <airlied> that leads to NON_SHAREABLE, radv uses NO_INTERPROCESS_SHARING internally
00:08fdobridge: <gfxstrand> I don't like "interprocess". What if the same process opens the file twice?
00:08fdobridge: <airlied> yeah NO_SHARE is the simplest one I have
00:08fdobridge: <gfxstrand> No share or no export
00:08fdobridge: <gfxstrand> IDK which is better
00:09fdobridge: <airlied> NO_EXPORT seems like someone might thing it's okay for import, though if you can't export then you can't import 😛
00:10fdobridge: <gfxstrand> Yeah
00:10fdobridge: <gfxstrand> And since you can't set new flags on import, you can't import either
00:10fdobridge: <airlied> pushed out NO_SHARE
00:10fdobridge: <gfxstrand> Okay
00:24fdobridge: <airlied> pushed out renames, and fixed up a few more kernel side bits, seeing some regressions with memory types, will chase those down
00:26fdobridge: <airlied> ah that was some other local crap
00:56fdobridge: <gfxstrand> 👍
01:00fdobridge: <airlied> okay fixed one regression in mesa side, kicking cts off
01:00fdobridge: <gfxstrand> 🥳
01:38fdobridge: <esdrastarsis> on gsp reclocking
01:38fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1132849264909877339/20230723_22h34m13s_grim.jpeg
02:09fdobridge: <airlied> hmm screwed up something, cts doesn't make it out alive, will have to dig in a bit more
04:12fdobridge: <airlied> might have to get dakr to figure out what I've done bad here, things seem to blowup randomly
05:32fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> At what settings?
05:39fdobridge: <airlied> okay got a complete cts run to pass with the other fix on that branch
05:54fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Also fix the wide Putin car :evil_gears:
05:58fdobridge: <gfxstrand> Do I need all the fixes in that branch?
06:06fdobridge: <airlied> the last 3 patches are the only ones for mesa interactions
06:12fdobridge: <gfxstrand> CTSing now
06:12fdobridge: <gfxstrand> It looks like it's still going to take 1.5 hours.
06:12fdobridge: <gfxstrand> I've got other theories about why things are taking forever.
06:12fdobridge: <gfxstrand> Fixing the submit scaling was still important though so I'm glad that's done.
06:14fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder if NVK can survive an overnight run of Unigine Heaven with GSP and new UAPI 🤔
06:15fdobridge: <gfxstrand> @airlied Congratulations on doing something in an hour that the i915 team hasn't been able to do in 5 years and counting. 🙃
06:16fdobridge: <gfxstrand> @airlied Congratulations on doing something in a few hours that the i915 team hasn't been able to do in 5 years and counting. 🙃 (edited)
06:22fdobridge: <airlied> @phomes so the reason I don't think mme macros work on dispatch is we don't upload mme macros for the compute class
06:23fdobridge: <airlied> might need to create a set of compute mme macros separate for those ones
06:24fdobridge: <airlied> though I'm not sure if the two classes have separate mme storage
06:26fdobridge: <gfxstrand> According to the headers, there's no way to upload MMEs for compute on Turing+
06:27fdobridge: <gfxstrand> Maybe it exists but not in the headers but I kinda doubt it
06:30fdobridge: <airlied> it's wierd it's in the ampere B headers I think
06:32fdobridge: <gfxstrand> Maybe they added it back?
06:32fdobridge: <gfxstrand> Maybe they wanted it on the lovelace cards because they tossed 3D but still needed MME?
06:32fdobridge: <gfxstrand> 🤷🏻♀️
06:33fdobridge: <gfxstrand> Or maybe it exists on Turing and they're just lying in the headers?
06:40fdobridge: <airlied> dang just got another strange gpu crash taking out the world with the resv change
07:43fdobridge: <gfxstrand> @airlied So, the crucible test is definitely 2x as fast with the new patches but it's still doing something per-BO per-EXEC.
07:44fdobridge: <gfxstrand> 1000 BOs doubles EXEC time from 66 to 122 us/exec
07:45fdobridge: <gfxstrand> Still better than the 212 us/exec without NO_SHARE
07:50fdobridge: <airlied> yeah I did some perf record traces earlier and I think we need to do something different for bo validation
07:52fdobridge: <airlied> @phomes I did some cond render hacks https://gitlab.freedesktop.org/airlied/mesa/-/commits/nvk-cond-render-hacks
07:52fdobridge: <airlied> it's pretty horrid though
07:52fdobridge: <airlied> fails all the local memory tests
07:55fdobridge: <gfxstrand> Oh, that's ughly...
07:56fdobridge: <gfxstrand> All because it randomly requires 16B alignment?
07:56airlied: not sure if it's 16B yet, it definitely doesn't like 4 bytes
07:57fdobridge: <gfxstrand> gross
07:57fdobridge: <airlied> also not all local memory tests are failing, but a bunch are
08:08fdobridge: <airlied> okay it's grosser than that, updated the branch, all 542 cond render tests pass now
08:08fdobridge: <airlied> I think the local bo fix is probably some sort of missing stall
08:08fdobridge: <airlied> but at least this gives us a working baseline
08:09fdobridge: <airlied> nvc0 has some gross stalling code, that might need leveraging
08:10fdobridge: <airlied> nvc0_hw_query_fifo_wait
08:10fdobridge: <gfxstrand> Ugh
08:11fdobridge: <airlied> I expect we need to get some nvidia cmd buffer dumping going here
09:24fdobridge: <karolherbst🐧🦀> kinda depends. The way nvc0 uses semaphores is super weird, but the CPU has to wait on the semaphore somehow if the CPU needs the result, if not, then just wait on the semaphore
09:24fdobridge: <esdrastarsis> Ultra 800x600
09:24fdobridge: <karolherbst🐧🦀> (in the push buffer that is)
09:24fdobridge: <esdrastarsis> I'm using an ultrawide monitor :frog_gears:
09:25fdobridge: <esdrastarsis> I can write a tutorial for you if you want (on arch linux)
09:27fdobridge: <airlied> For cond render it's just GPU waits I think, but it's weird esp with vram and bo alignnents
09:27fdobridge: <karolherbst🐧🦀> ahh yeah..
09:28fdobridge: <karolherbst🐧🦀> I think all those semaphorse need to be 16 bytes alligned at least
09:29fdobridge: <karolherbst🐧🦀> let's see...
09:30fdobridge: <karolherbst🐧🦀> mhh
09:32fdobridge: <karolherbst🐧🦀> @airlied huh? why this weird indirection in the mme?
09:33fdobridge: <karolherbst🐧🦀> why not just register the buffer instead passing null and emulating it in mme
09:38fdobridge: <airlied> My patch removes all the mme usage
09:38fdobridge: <airlied> Mme didnt work for compute
09:39fdobridge: <karolherbst🐧🦀> yeah... I don't know if we even have mme for compute on turing
09:40fdobridge: <karolherbst🐧🦀> and other gens
09:40fdobridge: <karolherbst🐧🦀> okay, so the remaining issue is just the alignment of htat buffer?
09:40fdobridge: <karolherbst🐧🦀> or something else?
09:41fdobridge: <airlied> Both the alignment and it has to be in gart
09:41fdobridge: <karolherbst🐧🦀> why in gart?
09:41fdobridge: <airlied> Direct reading from local gets wrong answers
09:42fdobridge: <karolherbst🐧🦀> ohhh right, if the CPU wants to read from it it has to be gart right..
09:43fdobridge: <karolherbst🐧🦀> mhh? that's a bit weird though, but yeah, maybe some caching issue if the CPU side messes with the buffers content or something, though if only the GPU writes to it it should be fine, no?
09:46fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> That performance is too low
09:47fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Just get the widescreen fix 🐸
09:57fdobridge: <esdrastarsis> Yes, I noticed that OpenGL has higher performance than Vulkan
10:07fdobridge: <airlied> Yeah VK cond render is a bit different to GL, in GL the driver controls the query buffer output used for cond render, but in VK they are app controlled
10:20fdobridge: <karolherbst🐧🦀> cursed
10:23fdobridge: <phomes> I have my lunch break now so I can take a quick look
10:25fdobridge: <phomes> the mme was just a leftover from when I tried to implement it all in mme. What @airlied did is better
10:51fdobridge: <esdrastarsis> So your cond render hack can improve performance?
10:54fdobridge: <rhed0x> pretty much nothing uses Vulkan conditional rendering
10:56fdobridge: <triang3l> Does something like VKD3D?
10:56fdobridge: <rhed0x> DXVK doesn't, dunno about VKD3D-Proton
10:56fdobridge: <triang3l> D3D11 has SetPredication too
10:57fdobridge: <rhed0x> yeah but dxvk just ignores it because the implementation broke watch dogs
10:59fdobridge: <esdrastarsis> It's used by zink at least
11:00fdobridge: <phomes> In my tests cts and Sascha Willems conditionalrender works well (on the uapi)
11:01fdobridge: <phomes> with my current skill level I am not sure I can help with much for the issues that remain
11:07fdobridge: <phomes> I was unsure if we should set it for 902d as well?
12:47fdobridge: <karolherbst🐧🦀> let's see if I can figure out this `INVALID_VALUE` bug...
13:28fdobridge: <karolherbst🐧🦀> now to figure out what test actually hits this...
15:36fdobridge: <karolherbst🐧🦀> @gfxstrand I'm currently debugging a shader throwing `ILLEGAL_INSTR_ENCODING` and I think it's happening when a pipeline is launched with just a vertex shader, but not fragment one
15:38fdobridge: <karolherbst🐧🦀> but also not entirely sure... I've replaced all instructions of the VP with `NOP`s, but it's still giving me the error, so it's just my assumption that the hw might execute a non existing FP here anyway
15:38fdobridge: <karolherbst🐧🦀> `dEQP-VK.robustness.image_robustness.push.notemplate.rgba32f.dontunroll.volatile.storage_image.no_fmt_qual.img.samples_1.1d.vert`
15:39fdobridge: <gfxstrand> Interesting..
15:39fdobridge: <gfxstrand> It's possible we need a dummy VS
15:39fdobridge: <karolherbst🐧🦀> you mean dummy FS
15:39fdobridge: <gfxstrand> It's possible we need a dummy FS (edited)
15:39fdobridge: <karolherbst🐧🦀> ahh
15:39fdobridge: <karolherbst🐧🦀> 😄
15:39fdobridge: <karolherbst🐧🦀> yeah.. probably
15:39fdobridge: <karolherbst🐧🦀> I'm checking on Ada as well as this was on Pascal
15:39fdobridge: <gfxstrand> A dummy FS is easy enough to add.
15:40fdobridge: <karolherbst🐧🦀> yeah....
15:40fdobridge: <karolherbst🐧🦀> it seems like that Ada doesn't care
15:40fdobridge: <karolherbst🐧🦀> and if Turing doesn't care it's probably a pre Volta thing
15:40fdobridge: <karolherbst🐧🦀> ohhhh....
15:40fdobridge: <karolherbst🐧🦀> of course it is...
15:41fdobridge: <karolherbst🐧🦀> it probably simply executes at offset 0 of the shader bo
15:41fdobridge: <karolherbst🐧🦀> or something silly
15:41fdobridge: <karolherbst🐧🦀> let me check if the FS stage is enabled/disabled
15:42fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1133061622169096222/0001-DUMMY-FS.patch
15:43fdobridge: <gfxstrand> Insert HW version checks as appropriate
15:43fdobridge: <karolherbst🐧🦀> mhhh.. it's disabled
15:43fdobridge: <karolherbst🐧🦀> yeah.. let me try that patch to verify
15:44fdobridge: <gfxstrand> This could explain a lot of the Maxwell instability
15:44fdobridge: <karolherbst🐧🦀> needs a `MESA_SHADER_FRAGMENT`
15:44fdobridge: <karolherbst🐧🦀> ehhh
15:44fdobridge: <karolherbst🐧🦀> `nir_options`
15:45fdobridge: <gfxstrand> Right. Look a couple lines up and you can find where to get that
15:45fdobridge: <karolherbst🐧🦀> yeah
15:46fdobridge: <karolherbst🐧🦀> mhh.. that crashes a little later one
15:46fdobridge: <karolherbst🐧🦀> *on
15:47fdobridge: <karolherbst🐧🦀> `nv_nir_move_stores_to_end` crashes as it can't find a block
15:47fdobridge: <karolherbst🐧🦀> no impl
15:47fdobridge: <gfxstrand> Okay, maybe do a `nir_builder_new()` instead then. 😅
15:48fdobridge: <karolherbst🐧🦀> 😄
15:48fdobridge: <gfxstrand> Or init_simple_shader, rather
15:48fdobridge: <karolherbst🐧🦀> ahh yeah, that was the thing
15:49fdobridge: <gfxstrand> Yeah,
15:49fdobridge: <gfxstrand> ```c++
15:49fdobridge: <gfxstrand> nir_builder b =
15:49fdobridge: <gfxstrand> nir_builder_init_simple_shaer(MESA_SHADER_FRAGMENT,
15:49fdobridge: <gfxstrand> nir_options, "Trivial FS");
15:49fdobridge: <gfxstrand> nir[MESA_SHADER_FRAGMENT] = b.shader;
15:49fdobridge: <gfxstrand> ```
15:49fdobridge: <gfxstrand> When in doubt, use the builder. 🙃
15:51fdobridge: <karolherbst🐧🦀> mhhhhh....
15:52fdobridge: <karolherbst🐧🦀> apparently it still happens
15:52fdobridge: <karolherbst🐧🦀> odd
15:54fdobridge: <gfxstrand> Could be something else
15:57fdobridge: <karolherbst🐧🦀> we have that `nvc0_program_init_tcp_empty` function in gl
15:58fdobridge: <karolherbst🐧🦀> yep...
16:00fdobridge: <karolherbst🐧🦀> @gfxstrand error is gone when I do it for `MESA_SHADER_TESS_EVAL` 😄
16:00fdobridge: <karolherbst🐧🦀> mhhh
16:01fdobridge: <karolherbst🐧🦀> that's weird though
16:01fdobridge: <karolherbst🐧🦀> I got other errors with `MESA_SHADER_TESS_CTRL` and that's what gl is doing... let me try that isntead
16:03fdobridge: <gfxstrand> Uh... what?
16:04fdobridge: <karolherbst🐧🦀> https://gist.github.com/karolherbst/e9048f2bd7559c02ab9ac2419fdb5073
16:05fdobridge: <karolherbst🐧🦀> that's what GL is doing and it seems to resolve the problem
16:05fdobridge: <karolherbst🐧🦀> test is still failing though, but it's also a fuzzing one so maybe I ignore the failure there
16:06fdobridge: <gfxstrand> Wait, it always wants a tessellation shader?
16:06fdobridge: <karolherbst🐧🦀> yes
16:06fdobridge: <karolherbst🐧🦀> I'm sure we do this in gl for a reason
16:06fdobridge: <karolherbst🐧🦀> see `nvc0_tctlprog_validate`
16:07fdobridge: <karolherbst🐧🦀> though we set a different value
16:07fdobridge: <karolherbst🐧🦀> `0x20` vs `0x21`
16:07fdobridge: <karolherbst🐧🦀> kinda weird
16:08fdobridge: <karolherbst🐧🦀> so yeah.. we disable the shader, but still put a program there and bind it 😄
16:09fdobridge: <karolherbst🐧🦀> they probably fixed whatever the hardware expected here for volta+ I think
16:09fdobridge: <gfxstrand> Heh. Could be, yeah.
16:10fdobridge: <karolherbst🐧🦀> yeah.. the diff in itself causes more issue mhh...
16:10fdobridge: <gfxstrand> 😓
16:43fdobridge: <karolherbst🐧🦀> yeah mhhh... no idea
16:45fdobridge: <karolherbst🐧🦀> maybe it's something with prefetching again
16:46fdobridge: <karolherbst🐧🦀> I'll... deal with other errors first 🙃
16:55fdobridge: <gfxstrand> I could believe we have prefetching bugs
18:05anholt: mme_builder_test.while_ine seems to consistently time out in ci now.
18:28fdobridge: <gfxstrand> Is that new? I didn't we'd really changed any of that code in a while.
18:30anholt: wasn't timing out for me last week, now I'm at 3/3.
18:36fdobridge: <gfxstrand> weird...
18:37fdobridge: <gfxstrand> Last change to src/nouveau/mme was in March.
18:37fdobridge: <gfxstrand> Maybe one of the winsys changes affected it?
18:38fdobridge: <gfxstrand> Yeah, blowing up for me too
18:39fdobridge: <gfxstrand> sanity is failing.... That's not good
18:39fdobridge: <gfxstrand> Oh, wait, new UAPI broke it
18:41fdobridge:<gfxstrand> bisects
18:43fdobridge: <gfxstrand> Okay, now they're all working for me. What the hell?
18:45fdobridge: <gfxstrand> There must be something uninitialized somewhere
18:56fdobridge: <gfxstrand> Oh, no. My one fail was because I typed in the wrong terminal and ran it on my intel box. 🤦🏻♀️
19:26fdobridge: <gfxstrand> Okay, updated to the new UAPI and they all still pass
19:26fdobridge: <gfxstrand> @eanholt what hardware are you seeing fails on?
19:42fdobridge: <karolherbst🐧🦀> ......
19:42fdobridge: <karolherbst🐧🦀> ..................................
19:42fdobridge: <karolherbst🐧🦀> `emitNOP` generates a `NOP` the hardware doesn't like 🙃
19:44fdobridge: <gfxstrand> Lovely
19:44fdobridge: <karolherbst🐧🦀> anyway, I figured out what's broken with the test and it's `sustp`
19:47fdobridge: <karolherbst🐧🦀> I blame `nvdisasm` for not shouting at me
19:49fdobridge: <karolherbst🐧🦀> ` /*0018*/ @!PT NOP.TRIG CC.RGT, 0xfff; /* 0x50b00000ffffffff */` ?!?
19:49fdobridge: <karolherbst🐧🦀> apparently the NOP instruction has some weirdo stuff on it
20:02fdobridge: <gfxstrand> funky
20:40fdobridge: <karolherbst🐧🦀> ohhh
20:40fdobridge: <karolherbst🐧🦀> right...
20:40fdobridge: <karolherbst🐧🦀> that's the unformatted image thing..
20:42fdobridge: <karolherbst🐧🦀> @gfxstrand coordinates of unformatted image loads in nir are they based on bytes or dest type?
20:43fdobridge: <karolherbst🐧🦀> noice...
20:43fdobridge: <karolherbst🐧🦀> enable unformatted loads on the ISA level gets rid of the error
20:48fdobridge: <karolherbst🐧🦀> well.. that throws an misalligned address error 🙃
20:48fdobridge: <karolherbst🐧🦀> (assuming byte addressing that is)
20:50fdobridge: <gfxstrand> should be pixels
20:50fdobridge: <karolherbst🐧🦀> okay..
20:50fdobridge: <karolherbst🐧🦀> so we have to emit `OP_SUSTB` and `OP_SULDB` for images without formats
20:51fdobridge: <karolherbst🐧🦀> let's see if that fixes the test
20:52fdobridge: <karolherbst🐧🦀> mhhh... something up with stores
20:53fdobridge: <karolherbst🐧🦀> okay.. at least I know how to fix it..
20:53fdobridge: <karolherbst🐧🦀> @gfxstrand mind checking real quick if `dEQP-VK.robustness.image_robustness.push.notemplate.rgba32f.dontunroll.volatile.storage_image.no_fmt_qual.img.samples_1.1d.vert` passes on turing?
20:54airlied: passes for me here
20:55fdobridge: <karolherbst🐧🦀> mhh..
20:55fdobridge: <karolherbst🐧🦀> I wonder if I break turing/ampere then
21:03fdobridge: <karolherbst🐧🦀> mhhh
21:03fdobridge: <karolherbst🐧🦀> I wonder if SULDB is broken, because the test fails, but at least the gpu stops throwing an error
21:04fdobridge: <karolherbst🐧🦀> the `SULDB` code was always unused, but was kept in case CL ever emerged
21:11fdobridge: <karolherbst🐧🦀> ohhh... I see