00:10 mhenning[d]: Yeah, it's not a promoted buffer
00:10 mhenning[d]: You NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL to get the size, then allocate, then NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND with the pointer
00:12 mhenning[d]: airlied[d]: If we make the zcull ctxsw buffer optional, then I think we would put contexts into NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_NO_CTXSW when we init and then change them to NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_SEPARATE_BUFFER when the buffer is allocated
00:14 airlied[d]: I don't think you need to do anything on init
00:15 airlied[d]: skeggsb9778[d]: what is PM in the ctxsw context? I see alongside ZCULL there are PM and SETUP things
00:15 skeggsb9778[d]: perfmon stuff
00:16 airlied[d]: ah perfmon, makes a lot more sense now
00:16 mhenning[d]: airlied[d]: mhh I assumed we'd get NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_GLOBAL by default and we don't want that
00:16 skeggsb9778[d]: i don't know much about it beyond that, i never tried to implement it in nouveau - the only perfmon stuff nouveau has is for way older gpus
00:17 skeggsb9778[d]: oh, and i deleted all that already 😛
00:17 airlied[d]: the kernel never sets zcull mode on it's gr context so I think doing nothing is fine, and GLOBAL seems to imply if one person sets it, everyone uses it, so might not be default
00:21 mhenning[d]: airlied[d]: I guess the concern would be that the prop userspace might unconditionally NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND at init - I actually didn't trace the no-zbuffer case
00:21 mhenning[d]: I guess I should do that
00:30 skeggsb9778[d]: airlied[d]: yeah, RM defaults to NO_CTXSW
03:35 airlied[d]: gfxstrand[d]: do we support carry-in/out on imad?
03:49 gfxstrand[d]: Not at the moment
03:49 gfxstrand[d]: We do support 64-bit lea, though
03:51 gfxstrand[d]: Since I'm farting around with constants today...
03:51 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33785
03:51 gfxstrand[d]: I've been meaning to do that for a loooong time.
03:52 gfxstrand[d]: I think it can probably be improved if we include `is_const()` in the heuristic for what to spill.
04:15 airlied[d]: oh I think the latencies stuff is mostly working for turing
04:17 gfxstrand[d]: I'm hoping to review Mel's scheduler tomorrow and then steal the dependency graph and use it to make calc_instr_deps better.
04:18 airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33573
04:19 airlied[d]: I might start in on ampere now that I'm pretty happy that is makes sense
04:21 airlied[d]: I think imad might need more work, and waw latencies with predicates, but I think they are correct just not optimal currently
04:43 gfxstrand[d]: Yeah, imad is a funky one, according to Karol.
04:43 gfxstrand[d]: We should arguably use it more
04:44 airlied[d]: the latencies change depending on if you are using the lower or upper 32-bits of the result
04:49 gfxstrand[d]: Fun...
04:49 gfxstrand[d]: Not surprising but still
04:50 gfxstrand[d]: Someone should do a full RE on imad and imad.x and then we can figure out what to use it for.
05:35 HdkR: imad was great pre-volta
05:41 HdkR: Kind of become redundant once 32-bit integer got full throughput :D
05:43 rinlovesyou[d]: airlied[d]: ooo i'll give this a test
05:49 airlied[d]: would be interesting to know if anything moves at all on turing with it, I'm kinda doubtful but who knows 🙂
06:01 rinlovesyou[d]: booting up my favorite benchmark, just cause 4
06:01 rinlovesyou[d]: which can not reach 60fps on master on the lowest settings at 1440p
06:04 rinlovesyou[d]: yeah i can't say i see a difference, as always hovering more around 45-50, going down in heavier areas
06:12 redsheep[d]: I think I might try some pcie scaling benchmarks, setting back the generation on my card to see how often pcie is a bottleneck
06:13 redsheep[d]: There has to be something like that wrong, it feels off for this kind of thing not to help
06:14 airlied[d]: nah it never feels off to me for ALU changes not to help
06:15 airlied[d]: you really have to have gotten rid of every other possible bottleneck before they matter
06:15 airlied[d]: I'm just doing it because I don't want to have to keep hacking the latencies to make HMMA work
06:15 airlied[d]: there's a nice benchmark written by nvidia for coop matrix, and I'd like to narrow the gap on that
06:47 rinlovesyou[d]: i'm just optimistically testing any MR that looks like it may bring perf improvements
06:48 rinlovesyou[d]: i am itching to jump ship
07:03 magic_rb[d]: Same here, but i gotta wait for 570 gsp and zfs because rn the gsp dies way too often
15:44 gfxstrand[d]: Doing a CTS run with `NAK_DEBUG=spill`. This is gonna take a while...
16:32 gfxstrand[d]: Of course I had to fix `NAK_DEBUG=spill` first...
16:45 karolherbst[d]: I had to fix `nir_opt_dead_write_vars` as it broke with function calls https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33807 gfxstrand[d] if you have some time to review it
16:46 karolherbst[d]: ehh wait.. don't we have a helper for that..
16:47 karolherbst[d]: none thata would help
16:52 gfxstrand[d]: Yay! djdeath3483[d] fixed our shader cache miss issue. 🎉
16:54 djdeath3483[d]: cool
17:04 gfxstrand[d]: Yeah, I noticed that DA:TV would spend a while pre-compiling shaders in steam with fossilize and then hit all the cache misses when the game starts. It's now pulling everything from the disk cache AFAICT.
17:04 redsheep[d]: Hopefully that helps cut down some of the stutter
17:05 gfxstrand[d]: It should
17:05 gfxstrand[d]: I'm a little surprised I've never seen it in valgrind
17:27 redsheep[d]: asdqueerfromeu[d]: Can the shaderSharedInt64Atomics box finally get checked on the tracker now? https://gitlab.freedesktop.org/mesa/mesa/-/issues/9479
17:27 redsheep[d]: Unless I am still missing something again it seems like this is a thing now https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33572
17:29 mhenning[d]: Yeah, should be able to
17:40 redsheep[d]: Doesn't mesa have some kind of fallback for really slow raytracing on gpus that don't support it? I thought I remembered that being a thing for rdna 1 or gnc with radv at least. If anything like that could be adapted for nvk I wonder if just getting those features enabled would be a good idea, even if it's extremely slow and has no awareness of how the hardware is meant to do it
17:45 gfxstrand[d]: It was worked on but it's not necessarily portable across GPUs.
17:48 redsheep[d]: Hmm so non-trivial then. Maybe it doesn't make sense to do, I suppose there could be applications that would end up getting slower, expecting those extensions to be fast
17:49 mhenning[d]: Yeah, I'd rather not add really slow emulated stuff - effort is better spent working on doing it correctly
17:51 gfxstrand[d]: Okay, I think I've convinced myself my spilling patch is correct. I just need to decide how much I hate `OpCopy::tmp`.
17:51 gfxstrand[d]: And I should probably pipeline-db it
17:51 tiredchiku[d]: considering making my NVK environment my primary boot option
17:52 tiredchiku[d]: :doomthink:
17:52 tiredchiku[d]: unfortunately nouveau handicaps my monitor's refresh rate
17:52 tiredchiku[d]: I've only really been playing dxvk titles of late, would be a good way to get some rigorous testing going on nvk
17:53 redsheep[d]: mhenning[d]: It just seems like something where it's exceptionally difficult to go from no implementation at all, to working well enough to default on. Having something slow that is functional enough to default or at least put behind an environment variable could help ease things. Or at least, that was my logic. Maybe not valid here.
17:55 mhenning[d]: gfxstrand[d]: I said this on the issue tracker, but can you just insert OpParCopy?
18:02 gfxstrand[d]: Yeah but then we have to make OpParCopy support other things as sources. I don't know how much work that is. In theory, it should be able to since I was planning to do that in the early days.
18:02 gfxstrand[d]: I can try that
18:02 gfxstrand[d]: Or we can make RA emit two instructions and eat the refactor
18:04 gfxstrand[d]: If we make OpParCopy so it can handle immediate and cbuf sources, that maybe makes the spilling a little easier. IDK that it actually makes us emit less shader code but maybe makes a few things easier.
18:04 gfxstrand[d]: It's worth a second attempt, I guess.
18:08 gfxstrand[d]: I really need a proper pipeline-db
18:11 mhenning[d]: gfxstrand[d]: Not sure this counts as "proper", but I have a few fossils here https://gitlab.freedesktop.org/mhenning/nvk-fossils-foss
18:12 mhenning[d]: I normally run that + the shaderdb pipelines
18:32 gfxstrand[d]: That'll help at any rate
18:38 asdqueerfromeu[d]: redsheep[d]: I guess this MR flew under my radar
18:59 cwabbott: when I was doing ir3 RA, making phis/parallel copies support immediate and const sources was absolutely crucial
19:00 cwabbott: the shader-db numbers were really terrible without it
19:01 cwabbott: there's a reason both ir3 and aco support this sort of trick
19:03 cwabbott: anything that can be easily "rematerialized" without any extra registers should really really be folded into the parallel copy source
19:05 gfxstrand[d]: mhenning[d]: I'm testing that patch now. I don't hate it. If it's all good, we'll probably do that
19:07 gfxstrand[d]: And I think it's safe to do it. There was a time when I was really nervous about anything except a SSA/reg going in an `OpParCopy` but I think it's safe with the current definitions.
19:13 gfxstrand[d]: I went around in circles a bit on `OpParCopy`. Originally, it was going to handle everything except bindless cbufs. Then I freaked because "OMG! Vectors!" But then I decided each `OpParCopy` src/dst pair was either a predicate or a dword. So now it's fine again except nothing really takes advantage of that.
19:21 gfxstrand[d]: The other thing to do is to hook up the new ConstTracker to the legalize code so we re-materialize things instead of copying GPRs.
19:22 gfxstrand[d]: That's one of the big issues with texture ops right now. We emit a bunch of copies to ensure we can form a vec but they're copies from GPRs instead of just blasting out the immediate.
19:23 gfxstrand[d]: Any time we can just remat an immediate or cbuf value instead of copying a GPR is a potential win for register pressure.
19:53 redsheep[d]: I am getting back around to the zink testing with the aim to get issues opened for all of this. I just updated to latest on everything, and I am still seeing massive discord flicker on specifically plasma wayland. I am also still seeing that only one monitor can be used at a time, again specifically plasma wayland. The x11 session doesn't have either of these issues (though it has others)
19:56 redsheep[d]: Anybody have bright ideas on how I can narrow down where to report these? It feels unlikely these are plasma issues, especially if somebody here on radv or anv can confirm plasma wayland+zink doesn't break the same way.
19:56 tiredchiku[d]: enable your igpu ya nerd
19:57 tiredchiku[d]: :3
19:58 redsheep[d]: My igpu only has one display connection I have the cables to use at the moment
19:59 redsheep[d]: I could check the discord issue I suppose, but not the multi monitor one
20:00 redsheep[d]: Didn't you say you already confirmed the discord issue doesn't happen for you on latest everything? That was the full on electron discord client?
20:01 tiredchiku[d]: correct
20:01 tiredchiku[d]: but also, ampere
20:01 redsheep[d]: If that's the case I am relatively sure this is ada being broken somehow, and I should go ahead with opening the issue on mesa
20:04 redsheep[d]: tiredchiku[d]: Do you have display scaling on? What scale factor?
20:04 tiredchiku[d]: no, 100%
20:05 tiredchiku[d]: 1440p 27 inch is fine enough
20:05 redsheep[d]: Yeah I just want to isolate a couple more variables if I can to make this issue a little more airtight
20:07 tiredchiku[d]: made my nvk env the default boot option
20:07 tiredchiku[d]: adventures begin tomorrow :3
20:08 redsheep[d]: That's your nvk+zink+plasma wayland config?
20:09 tiredchiku[d]: ya
20:09 redsheep[d]: tiredchiku[d]: Lemme know how that goes
20:09 tiredchiku[d]: nvk + zink, can switch out the de/wm at will, but yeah
20:10 tiredchiku[d]: will be dropping max refresh rate to 120 hz so I can continue to use DP audio without the display pretending there's an earthquake
20:11 redsheep[d]: Have you opened an issue for that on drm/nouveau?
20:11 tiredchiku[d]: or maybe I'll just switch to front panel audio
20:12 tiredchiku[d]: redsheep[d]: not yet, no
20:13 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1344764260978266266/iN1RQf9.png?ex=67c218d0&is=67c0c750&hm=30332010abca58474add400e56fc621e8ab8a30d1aa500628b2273c56b259d3b&
20:13 tiredchiku[d]: :frog_ok:
20:13 redsheep[d]: Strange... Discord is quite a lot less flickery with 100% scaling on. It's still bad though.
20:15 tiredchiku[d]: unfortunately NVK doesn't like Alan Wake
20:15 tiredchiku[d]: https://discord.com/channels/1033216351990456371/1171505720349446276/1344764872084426873
20:17 redsheep[d]: Alright when I get back from canceling comcast I will open a load of new issues
20:17 tiredchiku[d]: I'll open a few in my morning too I suppose
20:18 gfxstrand[d]: Oh, android... :frog_weary:
20:24 gfxstrand[d]: mhenning[d]: If you feel like reading through the const spill MR again, I think it's in its final form now.
20:25 gfxstrand[d]: The code is soooo readable compared to where that mess started. 😄
20:25 gfxstrand[d]: It feels dangerously simple.
20:27 gfxstrand[d]: Also, there's a patch in there that affects the fossil-db patch.
20:27 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1344767857648734220/hexcellsSteamLog.txt?ex=67c21c29&is=67c0caa9&hm=2a754c2c7767c3c64347e5a1aa2f51608de74cb5042572c8f8a526fca31cfebf&
20:27 redsheep[d]: Huh. Seems like games don't launch for me, in general. Something is quite busted with this session. If it's of any interest, here's what my terminal for steam said on the subject. It's failing early enough not to generate anything at all with ```PROTON_LOG=1```
20:28 redsheep[d]: Also it seems there are GSP errors in dmesg for failing to make the second display work.
20:28 redsheep[d]: ```[ 1309.638565] nouveau 0000:01:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff
20:28 redsheep[d]: [ 1309.638571] nouveau 0000:01:00.0: [drm] *ERROR* DP-4: invalid native reply 0x03
20:28 redsheep[d]: [ 1309.644537] nouveau 0000:01:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff
20:28 redsheep[d]: [ 1309.644541] nouveau 0000:01:00.0: [drm] *ERROR* DP-4: invalid native reply 0x03
20:28 redsheep[d]: [ 1309.650666] nouveau 0000:01:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff
20:28 redsheep[d]: [ 1309.650668] nouveau 0000:01:00.0: [drm] *ERROR* DP-4: invalid native reply 0x03
20:28 gfxstrand[d]: Looks like gamescope is failing to start XWayland
20:28 redsheep[d]: [ 1309.657269] nouveau 0000:01:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff
20:28 redsheep[d]: [ 1309.657271] nouveau 0000:01:00.0: [drm] *ERROR* DP-4: invalid native reply 0x03
20:28 tiredchiku[d]: Alan Wake 1 loses most perf on Shadows and Anti-Aliasing, in that order
20:29 gfxstrand[d]: ZCULL and color compression
20:29 gfxstrand[d]: In that order. 😛
20:29 redsheep[d]: Those tend to just be the heaviest in games anyway
20:29 tiredchiku[d]: everything low saw ~115fps, turning just shadows to high dropped it to ~70
20:29 tiredchiku[d]: from low, that is
20:30 tiredchiku[d]: doing the same w/ anti-aliasing dropped it to ~95
20:32 tiredchiku[d]: also it has 2 AA settings, separate
20:32 tiredchiku[d]: one labeled anti-aliasing, one labeled FXAA
20:32 tiredchiku[d]: the internet tells me, `Multi-sample antialiasing not supported by your graphics hardware. Alan Wake requires this in order to run. Please see that your system conforms with the minimum specification and ensure you have the latest graphics card drivers installed`.
20:32 tiredchiku[d]: so AW1 _needs_ MSAA
20:32 tiredchiku[d]: tiredchiku[d]: this was from MSAA 2x -> 8x
20:34 tiredchiku[d]: the setting labeled AA doesn't go below 2x, so
20:34 redsheep[d]: That GSP error almost makes me think it is trying something invalid with this second display, but... it lets me set the refresh rate before enabling, I am only attempting 4k60
20:35 redsheep[d]: This issue might be some combo of kernel and plasma and wayland edid handling, rather than being nvk or zink. Kinda hard to tell though.
20:36 mohamexiety[d]: I tried a while back to take a closer look at the display code to see if the fancier stuff could be implemented but it was a bit too much 😦
21:12 redsheep[d]: mohamexiety[d]: Yeah, no worries. I expected this falls into the waiting for Nova category
21:12 tiredchiku[d]: https://tenor.com/view/mr-bean-waiting-still-waiting-gif-13052487
21:12 tiredchiku[d]: /j
21:16 redsheep[d]: Considering how difficult the rust for Linux progress has appeared to be I think getting anything done is impressive. I do kind of worry that Nova will end up stuck in limbo though
21:21 redsheep[d]: I do also kind of agree with the earlier sentiment that getting stuff working on nouveau will help Nova when the time comes but that doesn't mean it's easy
21:25 gfxstrand[d]: NGL, I kinda love the fact that `NAK_DEBUG=spill` doesn't make a dent in my CTS runtimes.
21:58 gfxstrand[d]: We should probably `Box<>` our `RegTracker`s... I have a feeling they're a bit big for windows. :frog_sweat:
22:28 mhenning[d]: In the kernel, is there a way to get an `nvkm_chan` from a `nouveau_abi16_chan` ? I'm trying to wire up an ioctl for NV2080_CTRL_GR_CTXSW_ZCULL_BIND and I can't figure out how to get the channel handle starting from uapi objects
22:29 airlied[d]: okay cts passed on turing timings
22:31 airlied[d]: mhenning[d]: no it needs a new nvif/nvkm interface
23:38 skeggsb9778[d]: mhenning[d]: https://bpa.st/VXNQ
23:39 skeggsb9778[d]: it'd have been somewhat less annoying had my remove-ioctl series been merged already, but that should get you going
23:56 snowycoder[d]: Parser is starting to work :3
23:56 snowycoder[d]: I just need to implement manually all the istructions with strange formats
23:59 gfxstrand[d]: mhenning[d]: scheduler looks really good. My only real qualm is the barrier thing.