00:30fdobridge: <CME> whee
00:30fdobridge: <CME> https://cdn.discordapp.com/attachments/1034184951790305330/1062528935772033084/Bildschirmfoto_vom_2023-01-11_00-50-54.png
00:30fdobridge: <CME> https://cdn.discordapp.com/attachments/1034184951790305330/1062528936174702702/Bildschirmfoto_vom_2023-01-11_01-02-03.png
00:37fdobridge: <karolherbst🐧🦀> ohh wow.. do you own the talos principle?
00:37fdobridge: <karolherbst🐧🦀> or maybe I just give it a shot over the weekend or something myself 😄
00:37fdobridge: <CME> yeah it segfaults
00:37fdobridge: <CME> both linux and wine
00:38fdobridge: <karolherbst🐧🦀> mhh
00:38fdobridge: <karolherbst🐧🦀> ohh.. probably because it's kepler
00:38fdobridge: <karolherbst🐧🦀> oops
00:38fdobridge: <CME> 750ti is maxwell v1
00:38fdobridge: <karolherbst🐧🦀> ehh, right..
00:39fdobridge: <karolherbst🐧🦀> but you can reclock, so that's the important part
00:39fdobridge: <karolherbst🐧🦀> but you'll need a bunch of things
00:39fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/118
00:39fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/177
00:39fdobridge: <CME> ah, i did only apply 177 so far
00:39fdobridge: <karolherbst🐧🦀> yeah.. for games !118 will matter a lot more
00:41fdobridge: <CME> nice, that fixed it
00:47fdobridge: <karolherbst🐧🦀> cool
00:47fdobridge: <karolherbst🐧🦀> how fast does it run on pstate 0xf?
00:47fdobridge: <CME> hm, still dies after the croteam video
00:47fdobridge: <karolherbst🐧🦀> and is it faster than GL?
00:47fdobridge: <karolherbst🐧🦀> ehh 😢
00:47fdobridge: <CME> but the video was running with 1sec per frame
00:47fdobridge: <Esdras Tarsis> Try running in windowed mode and low resolution
00:47fdobridge: <CME> `[Mi Jan 11 01:46:50 2023] nouveau 0000:01:00.0: Talos[92332]: nv50cal_space: -16`
00:47fdobridge: <karolherbst🐧🦀> `echo 0xf > /sys/kernel/debug/dri/0/pstate`
00:48fdobridge: <karolherbst🐧🦀> mhh
00:48fdobridge: <karolherbst🐧🦀> sounds bad
00:48fdobridge: <karolherbst🐧🦀> sooo
00:48fdobridge: <karolherbst🐧🦀> it's out of VRAM 🙂
00:48fdobridge: <CME> yea already did that
00:48fdobridge: <CME> how? since i can't reach the main menu
00:48fdobridge: <karolherbst🐧🦀> @gfxstrand we might have to care more about VRAM consumption and actually have to figure out if we waste a ton of VRAM or something 😄
00:50fdobridge: <CME> also ` 60,00% [vdso] [.] __vdso_clock_gettime` when it's running that slow
00:51fdobridge: <karolherbst🐧🦀> yeah well... nouveau works great until you run into issues and then it just breaks ;')
00:51fdobridge: <karolherbst🐧🦀> yeah well... nouveau works great until you run into issues and then it just breaks 🥲 (edited)
00:52fdobridge: <karolherbst🐧🦀> anyway, you are out of VRAM and things are not working out that great in that case
00:52fdobridge: <CME> ~~just like with rocm on amdgpu~~
00:57fdobridge: <karolherbst🐧🦀> I know that nvk is doing some stupid things, the main question is... what exactly is the issue with talos here
00:57fdobridge: <karolherbst🐧🦀> how much VRAM does your GPU have?
00:57fdobridge: <CME> 2GB
00:57fdobridge: <karolherbst🐧🦀> maybe I'll deal with those issues and see what I can figure out
01:03fdobridge: <gfxstrand> Yeah, 2GB isn't a lot. We should still be able to fit, probably.
01:04fdobridge: <gfxstrand> We may be allocating a bunch of stuff VRAM-only
01:06fdobridge: <karolherbst🐧🦀> we probably want to bail if we know at submission time that we submit more memory than there is VRAM or something.. dunno
01:07fdobridge: <karolherbst🐧🦀> could probably also start not submitting with all the buffers attached or something.. no idea what's the best approach here and how much pain it would be later on if we continue to do that
01:08fdobridge: <gfxstrand> We can't rely on splitting batches
01:08airlied: later we won't be attaching any buffers since the api doesn't take any
01:08fdobridge: <karolherbst🐧🦀> that's not what I meant
01:08fdobridge: <karolherbst🐧🦀> ahh
01:08fdobridge: <karolherbst🐧🦀> soo...
01:08fdobridge: <karolherbst🐧🦀> how do you deal with oversubscribing? we just fail?
01:09fdobridge: <karolherbst🐧🦀> or well.. I guess we'll fail on alloc then
01:09fdobridge: <gfxstrand> No
01:10fdobridge: <Esdras Tarsis> modifying the config file
01:11fdobridge: <CME> ah, will try tomorrow
01:12fdobridge: <gfxstrand> @karolherbst🐧 What does BO_GART do?
01:13fdobridge: <karolherbst🐧🦀> puts the memory into sysmem
01:13fdobridge: <Esdras Tarsis> When Talos doesn't recognize your GPU it uses the medium setting, so theoretically you can lower texture quality to lower VRAM usage
01:14fdobridge: <gfxstrand> @karolherbst🐧 Can we set GART and LOCAL at the same time?
01:14fdobridge: <karolherbst🐧🦀> well.. hardware says yes, driver says no
01:14fdobridge: <karolherbst🐧🦀> and without resizable BAR it's quite useless
01:14fdobridge: <karolherbst🐧🦀> you see this 256MB area nvidia advertises as.. ehhh.. what were the flags again?
01:15fdobridge: <karolherbst🐧🦀> that's GART | LOCAL stuff
01:15fdobridge: <gfxstrand> I want something that's VRAM but I promise never to map it so it's ok to put in system RAM if we run out of space.
01:15fdobridge: <karolherbst🐧🦀> ahh yes
01:15fdobridge: <karolherbst🐧🦀> soo
01:16fdobridge: <karolherbst🐧🦀> the way you do it now is that you don't add it to the list of bos when submitting your job and then nouveau pages it out into system memory
01:16fdobridge: <karolherbst🐧🦀> but it's only doing so for unused memory
01:16fdobridge: <karolherbst🐧🦀> there are also buffers where the GPU doesn't care if it's in system memory or VRAM
01:16fdobridge: <karolherbst🐧🦀> and either way works
01:17fdobridge: <gfxstrand> How do I specify that?
01:17fdobridge: <karolherbst🐧🦀> so what I think I'd like to see is, that we can mark certain buffers as "it should be in VRAM,but if it has to be, you can also page it out into system memory"
01:17fdobridge: <karolherbst🐧🦀> you don't
01:17fdobridge: <gfxstrand> Where it accesses from system memory across PCIe if needed
01:17fdobridge: <gfxstrand> Yeah, that's a problem.
01:17fdobridge: <karolherbst🐧🦀> I know
01:17fdobridge: <karolherbst🐧🦀> it's just not supported
01:17fdobridge: <gfxstrand> That's not a problem we can fix in NVK
01:17fdobridge: <gfxstrand> Not really
01:17fdobridge: <karolherbst🐧🦀> correct
01:18fdobridge: <karolherbst🐧🦀> we just need two things really from the kernel driver here
01:19fdobridge: <karolherbst🐧🦀> 1. mappable VRAM (through PCIe BAR + resizable BAR so it's not limited to 256MB)
01:19fdobridge: <karolherbst🐧🦀> 2. marking certain buffers as allowed to be paged out into sys RAM on high memory pressure, but still as available to the GPU for jobs
01:19fdobridge: <karolherbst🐧🦀> 1. mappable and _readable_ VRAM (through PCIe BAR + resizable BAR so it's not limited to 256MB)
01:19fdobridge: <karolherbst🐧🦀> 2. marking certain buffers as allowed to be paged out into sys RAM on high memory pressure, but still as available to the GPU for jobs (edited)
01:21fdobridge: <karolherbst🐧🦀> but I suspect we are just not super careful about memory allocations or something ... dunno... maybe I spend some time figuring out if we leak memory or just allocate things we don't strictly need...
03:48fdobridge: <gfxstrand> It's possible we're leaking. Shouldn't be too hard to check.
04:16fdobridge: <gfxstrand> Doing a BO leak check run now
04:19fdobridge: <gfxstrand> Doesn't look like we're leaking but I'll let the run finish.
04:25fdobridge: <gfxstrand> Yeah, half-way through a run with no significant crashes on `assert(dev->bo_cnt == 0` in `nouveau_ws_device_destroy()`. I don't think we're leaking.
08:28OftenTimeConsuming: RSpliet: >nah, I upgraded to AMD. Sorry, not sorry You'd be sorry having to load up proprietary software before the card works.
14:43fdobridge: <gfxstrand> Yup, no leaks
15:53fdobridge: <karolherbst🐧🦀> yeah well.. annoying 😢
15:54fdobridge: <karolherbst🐧🦀> is nvk doing anything where we might allocate more memory than we actually need?
15:54fdobridge: <karolherbst🐧🦀> but.... mhh
15:55fdobridge: <karolherbst🐧🦀> @gfxstrand do you think we should track the reason a bo was created? like a string just indicating where it's coming from (or enum value or _something_) and then if we try to submit a batch >= vram_size we can dump the info and easily tell why stuff got allocated the most or something?
15:57fdobridge: <karolherbst🐧🦀> though I can see why that's harder to do in vulkan than e.g. gl
15:57fdobridge: <karolherbst🐧🦀> and might not even help
16:07fdobridge: <gfxstrand> I guess we could.
16:08fdobridge: <gfxstrand> I'm really not surprised that you'd run out of memory on a 2GB card, though. We really should focus on fixing the kernel.
16:08fdobridge: <gfxstrand> I mean, yeah, we can debug why this game is burning memory but lots of games blow past 2GB.
16:21fdobridge: <CME> forsaken also hits `[ 1620.025931] nouveau 0000:01:00.0: ForsakenEx[12492]: nv50cal_space: -16` in the main menu, with the same settings the game only eats 146MB vram on my 1060
16:24fdobridge: <karolherbst🐧🦀> yeah.. but that game doesn't really need much
16:24fdobridge: <karolherbst🐧🦀> min spec is like 1 GB VRAM
16:25fdobridge: <gfxstrand> We could make NVK refuse to allocate more VRAM than exists on the card and see if that does anything.
16:25fdobridge: <karolherbst🐧🦀> mhh
16:25fdobridge: <karolherbst🐧🦀> I can try that out on the weekend, I own the game and I own enough low spec cards 😄
16:26fdobridge: <karolherbst🐧🦀> but yeah.. I think refusing to allocate is probably a good idea. We might also want to check if our accounting is in order. Not that we run out of VRAM if the app things only 500MB are in use or something
16:26fdobridge: <karolherbst🐧🦀> but yeah.. I think refusing to allocate is probably a good idea. We might also want to check if our accounting is in order. Not that we run out of VRAM if the app thinks only 500MB are in use or something (edited)
16:35fdobridge: <gfxstrand> I wonder if we're actually running into resizable BAR issues.
16:35fdobridge: <gfxstrand> It may not be that we're running out of VRAM, just CPU mappable VRAM.
16:35fdobridge: <gfxstrand> That's a much smaller resource
16:35fdobridge: <gfxstrand> Or did you tell me that nouveau doesn't support that at all?
17:11fdobridge: <karolherbst🐧🦀> we can't read from it
17:12fdobridge: <karolherbst🐧🦀> unless the driver paged it out to system memory
17:12fdobridge: <karolherbst🐧🦀> but we map bos to write into them and that seems to work just fine
17:12fdobridge: <karolherbst🐧🦀> but we map bos to write to them and that seems to work just fine (edited)
17:33fdobridge: <gfxstrand> Ok, for stuff like descriptor tables, that's fine. For anything exposed to the client, not so much.
17:33fdobridge: <gfxstrand> The batch stuff also isn't currently guaranteed read-free.
17:34fdobridge: <gfxstrand> descriptor tables and shader programs are, though.
17:39fdobridge: <karolherbst🐧🦀> sure, but local mem isn't mapable, so that's fine
17:45fdobridge: <gfxstrand> We want to make some of it mappable eventually
17:45fdobridge: <gfxstrand> If we can
17:45fdobridge: <gfxstrand> But we don't need to today
17:45fdobridge: <gfxstrand> @airlied where do I find a kernel branch with the new uAPI?
17:46fdobridge: <karolherbst🐧🦀> the
17:51fdobridge: <karolherbst🐧🦀> yeah.. we really want to advertise VRAM like nvidia does. They just add a local + mappable 256MB memory type, but...
17:51fdobridge: <karolherbst🐧🦀> doesn't matter all that much
18:13fdobridge: <gfxstrand> Oh, yes it matters
18:13fdobridge: <gfxstrand> That's like a 20% perf bump for a bunch of DXVK stuff.
18:15fdobridge: <karolherbst🐧🦀> I see
18:15fdobridge: <karolherbst🐧🦀> I guess that's for very small reads and stuff?
18:15fdobridge: <gfxstrand> vertex etc. upload.
18:15fdobridge: <gfxstrand> Any stream upload, really.
18:16fdobridge: <gfxstrand> WC map, write directly to VRAM, do the thing.
18:16fdobridge: <karolherbst🐧🦀> mhh
18:16fdobridge: <karolherbst🐧🦀> yeah.. I guess that's the data which all fits within 256MB
18:17fdobridge: <karolherbst🐧🦀> though if we add it, we might want to support resizable BAR while at it anyway
18:17fdobridge: <karolherbst🐧🦀> I suspect nobody was looking into it for now, not sure if it's on @airlied radar or if we have to ping Ben about it
18:20fdobridge: <Esdras Tarsis> vkcube crashed my xorg session :ferris_happy:
18:44fdobridge: <gfxstrand> 😬
19:05fdobridge: <gfxstrand> Does pre-turning have some weird render target size alignment or something? 🤔
19:13fdobridge: <airlied> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next
19:14fdobridge: <karolherbst🐧🦀> probably, maybe it's page aligned, or 0x100 like a lot of other stuff
19:15airlied: jekstrand: my new uapi branch has some alignment changes
19:16fdobridge: <gfxstrand> All my addresses look mostly fine
19:17fdobridge: <gfxstrand> They're aligned to like 2^20
20:07fdobridge: <gfxstrand> Oh, I know what's going on. 😅
20:07fdobridge: <gfxstrand> maybe
20:13fdobridge: <gfxstrand> Yup. We were allocating shadow BOs for linear images on Maxwell.
20:13fdobridge: <gfxstrand> That should fix a bunch of tests.
20:13fdobridge: <gfxstrand> Can't wait until I can delete the shadow BO stuff.
20:35fdobridge: <karolherbst🐧🦀> uhh
20:35fdobridge: <karolherbst🐧🦀> we need the new UAPI to delete those BOs?
20:56fdobridge: <CME> hmm, tried launching forsaken for the 3rd time and `[22707.909287] nouveau 0000:01:00.0: Xwayland[3535]: nv50cal_space: -16`
20:57fdobridge: <CME> now the kernel oopsed and gnome is just a corrupted mess 😄
21:21fdobridge: <gfxstrand> Yup. Anything with a non-zero PTE kind needs to be bound sparse.
21:21fdobridge: <gfxstrand> ```
21:21fdobridge: <gfxstrand> Pass: 218886, Fail: 9757, Crash: 2234, Warn: 4, Skip: 1328585, Flake: 1097, Duration: 36:21
21:21fdobridge: <gfxstrand> ```
21:21fdobridge: <gfxstrand> That's a bit better. 😄
21:21fdobridge: <karolherbst🐧🦀> heh
21:44fdobridge: <airlied> @gfxstrand not sure we have that requirement well dealt with in the kernel interface, maybe it just works
21:46fdobridge: <gfxstrand> Which requirement?
21:47fdobridge: <airlied> non-zero PTE kind bound as sparse
21:48fdobridge: <airlied> would be good to try and wire that up on the new uapi
21:48fdobridge: <gfxstrand> Yeah, we should.
21:48fdobridge: <gfxstrand> Really the only thing the uAPI needs is support for specifying the PTE kind in the binding.
21:49fdobridge: <gfxstrand> But we should get rid of the shadow BOs and do it "properly" in your branch and prove it out.
22:57airlied: Pass: 598536, Fail: 966, Crash: 2, Warn: 11, Skip: 1313606, Flake: 1, Duration: 4:09:30, Remaining: 0
22:57airlied: on the new uapi
22:58airlied: on my tu104
23:04airlied: oh maybe that was lvp :-P
23:04airlied: ah yes lots of traps now
23:34fdobridge: <gfxstrand> Ok, so blit (draw) to 3D works.... just on every 4th slice. WTF?
23:50fdobridge: <karolherbst🐧🦀> eh
23:50fdobridge: <mhenning> is it in color? mixing up planar and interleaved rgba? or something?
23:51fdobridge: <mhenning> (taking shots in the dark here)
23:51fdobridge: <gfxstrand> It's a normal R8G8B8A8 3D image.
23:52fdobridge: <gfxstrand> Slices 0, 4, 8, and 12, all the others are wrong (it's got 16 total)
23:52fdobridge: <gfxstrand> where by "wrong" I mean nothing rendered at all
23:52airlied: is 4 got 4 or has 4 got 1?
23:53fdobridge: <gfxstrand> 4 has 4
23:53fdobridge: <gfxstrand> I thought of that. 😅