00:00anholt: v6.3.13-for-mesa-ci-bbe75e512c76
00:02karolherbst: you have that helper invocation patch, right?
00:10anholt: do you expect lack of helper inv patch to flake all rendering tests? I feel like you've had me mess with it before.
00:13karolherbst: mhhh... unlikely though. Kinda sounds more like a memory coherency problem
00:14karolherbst: or something like that
00:15benjaminl: ime there are tests that succeed without the patch because the registers that it's loading descriptors into start out with the right value by coincidence
00:21anholt: hmm. we call our heap coherent, but we're not setting NOUVEAU_GEM_DOMAIN_COHERENT?
00:24karolherbst: in gl we don't use it either
00:24karolherbst: well.. for fences we do I guess
00:27karolherbst: anholt: mind checking what happens here? https://gitlab.freedesktop.org/nouveau/mesa/-/blob/nvk/main/src/nouveau/winsys/nouveau_device.c#L246
00:29anholt: device->vram_size should be zero.
00:29karolherbst: well.. should be, but who knows
00:44fdobridge: <karolherbst🐧🦀> yeah okay.. sometihng is busted with Ada
00:44fdobridge: <karolherbst🐧🦀> "gsp: Xid:13 Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): Misaligned Register"
00:44fdobridge: <mhenning> prime + gsp worked well enough for glxgears on ampere last time I tried it (but not well enough for a cts run)
00:45fdobridge: <karolherbst🐧🦀> yeah.. looks like Ada actually requires some work even though the 3D support is literally the same as the one from ampere 😄
00:46fdobridge: <karolherbst🐧🦀> I should also for ISA docs for Ada 🙃
00:54fdobridge: <karolherbst🐧🦀> something in those shaders is wrong https://gist.githubusercontent.com/karolherbst/86a016683abf9a17f1b65398b44b4638/raw/dcb380558db8ab638fefe38552182962552262a2/gistfile1.txt
00:55fdobridge: <karolherbst🐧🦀> I wonder if tex requires stricter alignment...
00:55fdobridge: <karolherbst🐧🦀> `2: tex 2D $r8 $s0 rgba f32 { $r0 $r1 $r2 $r3 } $r0 $r1 (16)`
00:56fdobridge: <karolherbst🐧🦀> huh...
00:56fdobridge: <karolherbst🐧🦀> how does that get translated to `TEX R1, R0, R0, R1, 0xf, 0x8, 2D ;`
00:58fdobridge: <karolherbst🐧🦀> ` 2: tex 2D $r8 $s0 rgba f32 { $r0d $r2d } $r0d (16)` on turing.... huh
00:58fdobridge: <karolherbst🐧🦀> did I forget to enable ada somewhere? 😄
01:00fdobridge: <karolherbst🐧🦀> ohh soooo
01:00fdobridge: <karolherbst🐧🦀> *shooo
01:03fdobridge: <karolherbst🐧🦀> ir renders :3
01:04fdobridge: <karolherbst🐧🦀> let's see how bad a CTS run is
01:07fdobridge: <karolherbst🐧🦀> "Pass: 8731, Fail: 37, Crash: 1, Warn: 2, Skip: 728, Flake: 1, Duration: 1:47, Remaining: 19:39"
01:07fdobridge: <karolherbst🐧🦀> not bad so far
01:10fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24226 🙃
01:10fdobridge: <karolherbst🐧🦀> low effort enablement
01:18fdobridge: <airlied> okay fixed external fd support in new-uapi
01:21fdobridge: <gfxstrand> Building...
01:35fdobridge: <karolherbst🐧🦀> the thing I like the most about GSP is, that we finally get some more useful error messages 😄
01:35fdobridge: <karolherbst🐧🦀> `Pass: 105202, Fail: 472, Crash: 14, Warn: 17, Skip: 8229, Timeout: 37, Flake: 25, Duration: 24:09, Remaining: 0`
01:35fdobridge: <karolherbst🐧🦀> @gfxstrand I think we can just enable Ada ....
01:36fdobridge: <karolherbst🐧🦀> (in nvk)
01:36fdobridge: <airlied> you should find that ampere bug I was seeing with the SKED error
01:36fdobridge: <karolherbst🐧🦀> ahhh.. good idea
01:37fdobridge: <karolherbst🐧🦀> @airlied which test was it?
01:37fdobridge: <karolherbst🐧🦀> though I guess that was vulkan
01:37fdobridge: <airlied> yeah it was vulkan
01:38fdobridge: <airlied> dEQP-VK.pipeline.monolithic.spec_constant.compute.expression.array_size
01:38fdobridge: <karolherbst🐧🦀> mhh, the only errors I'm seeing with GL are out of range ones
01:38fdobridge: <karolherbst🐧🦀> okay.. I'll do the vulkan enablement tomorrow
01:38fdobridge: <karolherbst🐧🦀> kinda funky that a 9 loc patch is all it takes for GL
01:38fdobridge: <airlied> uggh build a kernel with lots of debug turned on to find bugs, run CTS, 18 hours later 😛
01:39fdobridge: <karolherbst🐧🦀> uhhh...
01:39fdobridge: <karolherbst🐧🦀> the thing is.. I use USB-PD to charge this laptop even though it needs like... 180W AC 😄
01:39fdobridge: <airlied> probably should build a second kernel
01:47fdobridge: <gfxstrand> Nice!
01:48fdobridge: <karolherbst🐧🦀> does reclocking work with GSP?
01:48fdobridge: <karolherbst🐧🦀> I uhm... might want to do some fancy benchmarks with nvk 😄
01:48fdobridge: <gfxstrand> @airlied Does that branch have GSP as well?
01:48fdobridge: <airlied> nope
01:49fdobridge: <gfxstrand> That'd be cool. 😁 I expect it to be kinda crappy relative to the blob because we've not optimized a thing and UBOs are horrible but it should run kinda okay.
01:49fdobridge: <karolherbst🐧🦀> yeah, it's more of a "look, with GSP and nvk you can finally do actual gaming with nouveau"
01:49fdobridge: <karolherbst🐧🦀> even if it's like only 20% as fast, it's better than the current 1%
01:50fdobridge: <karolherbst🐧🦀> that laptop I got here is seriously overspeced
01:50fdobridge: <airlied> I did a talos video a while back already showing that :-)\
01:50fdobridge: <airlied> not sure it showed you could game though :-P, since it still seemed overly slow
01:51fdobridge: <karolherbst🐧🦀> mhhh
01:51fdobridge: <karolherbst🐧🦀> but was that with an RTX 5000 Ada
01:51fdobridge: <gfxstrand> We do a full stall on every pipeline barrier....
01:51fdobridge: <gfxstrand> That's not helping anyone. 😅
01:52fdobridge: <karolherbst🐧🦀> ehh seems like that GPU is like an RTX 4070 Ti
01:52fdobridge: <karolherbst🐧🦀> or 4080
01:52fdobridge: <karolherbst🐧🦀> should show some more impressive fps
01:52fdobridge: <karolherbst🐧🦀> I'm sure I can just.. uhm.. disable that
01:53fdobridge: <karolherbst🐧🦀> ohh wait.. I can do some low effort benchmarking
01:53fdobridge: <karolherbst🐧🦀> but I gotta connect AC for that
01:56fdobridge: <gfxstrand> I mean... we need some WFIs...
01:56fdobridge: <karolherbst🐧🦀> ehhh.. we'll figure it out until XDC
01:56fdobridge:<gfxstrand> pulls @airlied's branch
01:57fdobridge: <airlied> keep up to date with it, I'm still closing the gap since the last rebase on master
02:00fdobridge: <karolherbst🐧🦀> looks like pixmark piano crashed the system 🥲
02:01fdobridge: <airlied> also -Dnvk-experimental-uapi=true
02:02fdobridge: <karolherbst🐧🦀> yeah huh... somehow running things besides the CTS and glxgears makes the machine crash
02:03fdobridge: <airlied> looks like d32s8 has some fallout from rebasing, will track it down
02:05fdobridge: <karolherbst🐧🦀> funky.. unigine heaven runs
02:05fdobridge: <karolherbst🐧🦀> but kinda slow
02:06fdobridge: <gfxstrand> Okay, first new uAPI CTS run going
02:07fdobridge: <karolherbst🐧🦀> 2560x1600 ultra with 8x AA is like 30 fps
02:07fdobridge: <gfxstrand> Not bad!
02:07fdobridge: <karolherbst🐧🦀> maybe I should build mesa as release...
02:07fdobridge: <karolherbst🐧🦀> the CPU is kinda busy
02:07fdobridge: <gfxstrand> Never mind... I need to re-calibrate. I'm thinking Intel. 😂
02:07fdobridge: <karolherbst🐧🦀> 😄
02:08fdobridge: <karolherbst🐧🦀> ahh yeah, let me run that with intel actually 😄
02:09fdobridge: <karolherbst🐧🦀> 10 fps on intel
02:10fdobridge: <karolherbst🐧🦀> yeah mhhh
02:10fdobridge: <karolherbst🐧🦀> dunno if reclocking works actually here 😄
02:11fdobridge: <karolherbst🐧🦀> mhhh
02:11fdobridge: <karolherbst🐧🦀> CPU is still at 100%
02:12fdobridge: <karolherbst🐧🦀> I wonder if we are doing something dumb...
02:12fdobridge: <karolherbst🐧🦀> but the GPU is also running hot
02:12fdobridge: <karolherbst🐧🦀> I'm sure nvk is 100x faster, but that's for tomorrow to find out
02:14fdobridge: <karolherbst🐧🦀> I should sleep I have a meeting in like... 10 hours
02:15fdobridge: <gfxstrand> Oh, I'm sure it is
02:20fdobridge: <airlied> okay pushed a fix for the d32s8 fails
02:21fdobridge: <gfxstrand> kk
02:21fdobridge: <gfxstrand> I'm running the CTS now
02:21fdobridge: <gfxstrand> How do I know that it's using the new uAPI?
02:24fdobridge: <airlied> does vulkaninfo show sparseResidency?
02:24fdobridge: <airlied> sorry sparseBinding
02:24fdobridge: <airlied> and timelineSeamphore
02:26fdobridge: <gfxstrand> Yeah
02:27fdobridge: <gfxstrand> Okay, that'll tell me. 😄
02:32fdobridge: <gfxstrand> Okay, missing the meson flag.
02:34fdobridge: <airlied> I expect there might be a few regressions sitting around, since planes kinda was a start from scratch moment
02:35fdobridge: <gfxstrand> heh
02:35fdobridge: <gfxstrand> If the KMD is stable, I'll probably spend some time this week playing with it.
02:35fdobridge: <gfxstrand> How close to merging are we on the kernel side? If I review the API are we good?
02:36fdobridge: <airlied> yeah I think the gpuva stuff is acked for landing, so it's just the nouveau side which we want to ack
02:37fdobridge: <airlied> not sure how careful we want to be about exposing the new uapi, it's probably fine since nvk is the only user
02:37fdobridge: <gfxstrand> Okay. Cool
02:37fdobridge: <airlied> so we've got a fair few weeks until a Linus kernel will have it
02:37fdobridge: <gfxstrand> I'll focus on that this week.
02:37fdobridge: <gfxstrand> I would love to merge NVK into mesa/main
02:38fdobridge: <airlied> we'd have to burn all the old uapi bits out as well, (or keep them in a branch)
02:39fdobridge: <gfxstrand> Yup
02:39fdobridge: <airlied> I'll try and gsp/newuapi crossover going on my ampere now, since it can actually run cts in less than a day
02:40fdobridge: <gfxstrand> Yeah, we should make sure that's okeay too
02:40fdobridge: <gfxstrand> I could test that, too, in theory.
02:45fdobridge: <gfxstrand> vulkaninfo: ../src/nouveau/winsys/nouveau_bo.c:39: bo_bind: Assertion `ret == 0' failed.
02:45fdobridge: <airlied> yeah there's usually a bit of impedance rematching between the two threads of development before it works
02:45fdobridge: <airlied> for everything?
02:45fdobridge: <gfxstrand> At least I know it's the right branch! 😅
02:47fdobridge: <airlied> was that vulkaninfo? 🙂
02:47fdobridge: <gfxstrand> Yeah
02:48fdobridge: <airlied> also NVK_DEBUG=vm is a thing, but vulkaninfo usually works for me
02:49fdobridge: <gfxstrand> ```
02:49fdobridge: <gfxstrand> alloc vma 2b000 1000 sparse: 1
02:49fdobridge: <gfxstrand> vm bind failed 22
02:49fdobridge: <gfxstrand> ```
02:50fdobridge: <airlied> on turing?
02:50fdobridge: <gfxstrand> yup
02:52fdobridge: <airlied> I'll build the same kernel branch as you did to check
02:55fdobridge: <airlied> what nvk branch did you build?
02:55fdobridge: <airlied> just in case you got some old ass uapi
02:57fdobridge: <airlied> e791b06a is the latest
02:57fdobridge: <gfxstrand> `git fetch https://gitlab.freedesktop.org/nouvelles/kernel/ new-uapi-drm-next`
02:58fdobridge: <gfxstrand> 46a6a880babcbe56c3a2ce9ed44aca718fc7dc1d
02:58fdobridge: <gfxstrand> + karol's patch
03:00fdobridge: <gfxstrand> As per this
03:00fdobridge: <esdrastarsis> on gsp ada?
03:03fdobridge: <airlied> I'm just building that kernel branch now
03:11fdobridge: <airlied> okay seems fine here, you confirm the mesa branch is as above?
03:14fdobridge: <airlied> it might be some pte_kind related stuff though
03:15fdobridge: <airlied> @gfxstrand in nouveau_bo.c can you bump the (1 << 16) to (1 << 21)
03:15fdobridge: <airlied> line 265
03:15fdobridge: <airlied> or around there
03:34fdobridge: <gfxstrand> Not at the moment but after a bit
03:46fdobridge: <gfxstrand> Nope
03:48fdobridge: <gfxstrand> @airlied Fresh pulled the mesa branch from your MR
03:48fdobridge: <gfxstrand> e791b06a758ef4e8e75200c882fa03645fc94628
03:49fdobridge: <gfxstrand> Same error
03:49fdobridge: <gfxstrand> Hrm... Maybe it's trying on Maxwell
03:49fdobridge: <gfxstrand> I've got both cards plugged in after all
03:51fdobridge: <gfxstrand> Okay, yeah, it was the maxwell
03:52fdobridge: <gfxstrand> CTSing Turing now
03:53fdobridge: <airlied> okay I've only tried turing/ampere
03:53fdobridge: <gfxstrand> I'll poke at stuff once I get a Turing baseline
03:54fdobridge: <gfxstrand> I also want to poke about in the patches and see how I feel about the new paths.
03:54fdobridge: <gfxstrand> i.e. review, but with running stuff and tweaking the code as it strikes my fancy
03:54fdobridge: <gfxstrand> So far CTS seems to be taking longer and that's a little concerning.
03:55fdobridge: <gfxstrand> Oh, that could be because tests run now that didn't before. 🤔
03:59fdobridge: <gfxstrand> It also seems to be failing a bit more but I'm less than 10 min into the run. Hopefully I'll have results in the morning.
04:09fdobridge: <gfxstrand> It's bedtime soon so I'm not going to really look any more tonight. I'm just going to hope my kernel survives the run and look in the morning.
04:14fdobridge: <airlied> cool
04:14fdobridge: <airlied> I just got ampere gsp/new-uapi to boot, was a few hoops to jump through
04:23fdobridge: <gfxstrand> Woo
05:21fdobridge: <airlied> Pass: 378177, Fail: 2670, Crash: 471, Skip: 1633386, Flake: 539, Duration: 1:06:34, Remaining: 0 is my ampere/gsp/new-uapi run
05:29fdobridge: <esdrastarsis> Is turing working on gsp now?
06:42fdobridge: <gfxstrand> Similar only with the added bonus of kernel bugs
06:45fdobridge: <gfxstrand> ```
06:45fdobridge: <gfxstrand> [14384.793943] watchdog: BUG: soft lockup - CPU#13 stuck for 5696s! [gnome-shell:1626]
06:45fdobridge: <gfxstrand> ...
06:45fdobridge: <gfxstrand> [14300.792861] Call Trace:
06:45fdobridge: <gfxstrand> [14300.792862] <IRQ>
06:45fdobridge: <gfxstrand> [14300.792862] ? watchdog_timer_fn+0x1a8/0x210
06:45fdobridge: <gfxstrand> [14300.792864] ? __pfx_watchdog_timer_fn+0x10/0x10
06:45fdobridge: <gfxstrand> [14300.792865] ? __hrtimer_run_queues+0x10f/0x2b0
06:45fdobridge: <gfxstrand> [14300.792867] ? hrtimer_interrupt+0xf8/0x230
06:45fdobridge: <gfxstrand> [14300.792869] ? __sysvec_apic_timer_interrupt+0x5e/0x130
06:45fdobridge: <gfxstrand> [14300.792871] ? sysvec_apic_timer_interrupt+0x6d/0x90
06:45fdobridge: <gfxstrand> [14300.792872] </IRQ>
06:45fdobridge: <gfxstrand> [14300.792872] <TASK>
06:45fdobridge: <gfxstrand> [14300.792873] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
06:45fdobridge: <gfxstrand> [14300.792876] ? ioread32+0x34/0x60
06:45fdobridge: <gfxstrand> [14300.792878] nouveau_dma_wait+0x3a1/0x6d0 [nouveau]
06:45fdobridge: <gfxstrand> [14300.792984] nouveau_gem_ioctl_pushbuf+0x1688/0x1b00 [nouveau]
06:45fdobridge: <gfxstrand> [14300.793098] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
06:45fdobridge: <gfxstrand> [14300.793209] drm_ioctl_kernel+0xca/0x170
06:45fdobridge: <gfxstrand> [14300.793210] drm_ioctl+0x26d/0x4b0
06:46fdobridge: <gfxstrand> [14300.793212] ? __pfx_nouveau_gem_ioctl_pushbuf+0x10/0x10 [nouveau]
06:46fdobridge: <gfxstrand> [14300.793324] nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
06:46fdobridge: <gfxstrand> [14300.793435] __x64_sys_ioctl+0x91/0xd0
06:46fdobridge: <gfxstrand> [14300.793437] do_syscall_64+0x5d/0x90
06:46fdobridge: <gfxstrand> [14300.793438] ? exc_page_fault+0x7f/0x180
06:46fdobridge: <gfxstrand> [14300.793440] entry_SYSCALL_64_after_hwframe+0x72/0xdc
06:46fdobridge: <gfxstrand> ```
06:46fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1131114596451766293/message.txt
06:46fdobridge: <gfxstrand> ```
06:46fdobridge: <gfxstrand> [14384.793943] watchdog: BUG: soft lockup - CPU#13 stuck for 5696s! [gnome-shell:1626]
06:46fdobridge: <gfxstrand> ...
06:46fdobridge: <gfxstrand> [14300.792861] Call Trace:
06:46fdobridge: <gfxstrand> [14300.792862] <IRQ>
06:46fdobridge: <gfxstrand> [14300.792862] ? watchdog_timer_fn+0x1a8/0x210
06:46fdobridge: <gfxstrand> [14300.792864] ? __pfx_watchdog_timer_fn+0x10/0x10
06:46fdobridge: <gfxstrand> [14300.792865] ? __hrtimer_run_queues+0x10f/0x2b0
06:46fdobridge: <gfxstrand> [14300.792867] ? hrtimer_interrupt+0xf8/0x230
06:46fdobridge: <gfxstrand> [14300.792869] ? __sysvec_apic_timer_interrupt+0x5e/0x130
06:46fdobridge: <gfxstrand> [14300.792871] ? sysvec_apic_timer_interrupt+0x6d/0x90
06:46fdobridge: <gfxstrand> [14300.792872] </IRQ>
06:46fdobridge: <gfxstrand> [14300.792872] <TASK>
06:46fdobridge: <gfxstrand> [14300.792873] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
06:46fdobridge: <gfxstrand> [14300.792876] ? ioread32+0x34/0x60
06:48fdobridge: <airlied> I think that's just a scheduling problem, not sure it's a real issue
06:49fdobridge: <airlied> since at least the instance you give there is a legacy ABI call
06:50fdobridge: <airlied> @esdrastarsis no idea, ampere worked for me, but I haven't got all the turing pieces line up for ben's latest work
06:57fdobridge: <gfxstrand> IDK if that's the issue but my whole machine locked up with one test group left to complete.
06:57fdobridge: <gfxstrand> I'm trying another run.
06:58fdobridge: <airlied> there might be some interaction between a desktop running on legacy and a CTS running that we haven't seen
06:58fdobridge: <airlied> though I've left gdm going on my ampere
07:00fdobridge: <gfxstrand> I shut gdm off for now
07:00fdobridge: <gfxstrand> Just in case
07:19fdobridge: <airlied> Pushed a couple of minor new uapi regression fixes. Are you seeing any big ones?
07:39fdobridge: <gfxstrand> I haven't gotten through a run yet
07:41fdobridge: <airlied> I'll start my turing non-gsp mode run now, should be finished tomorrow sometime
08:11fdobridge: <gfxstrand> I just kicked off another after my last run died in a fire
08:46fdobridge: <karolherbst🐧🦀> yeah
08:57fdobridge: <gfxstrand> nouveau_sched locked up again. 🙄
08:59fdobridge: <gfxstrand> RCU is giving me red text. That's bad....
09:01fdobridge: <gfxstrand> I'm gonna reboot
09:07fdobridge: <airlied> Okay copies of red text would be good, I'll see if mine throws anything wierd
09:18fdobridge: <gfxstrand> I've not been able to get through a full run yet. 😭
09:18fdobridge: <gfxstrand> I really should try sleeping again.
09:18fdobridge: <gfxstrand> I've heard that it's good for you
09:20fdobridge: <marysaka> :nya_panic:
11:32fdobridge: <karolherbst🐧🦀> @gfxstrand do you plan to rebase on top of `mesa/main` in the near future? I've landed the Ada enablement which also has trivial codegen patches.
11:37fdobridge: <karolherbst🐧🦀> @gfxstrand also mind sharing the exact command, then git hash you run the VK CTS on and your most recent `nvk/main` failures.csv (and the commit you ran it on) so I can diff it here? If you don't have anything recent I'm just going to run it through Turing/Ampere myself or something
13:19fdobridge: <gfxstrand> yay! New uAPI run finally completed successfully! \o/
13:22fdobridge: <gfxstrand> Rebasing now
13:23fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Now pour some GSP sauce
14:04fdobridge: <karolherbst🐧🦀> It looks like nvk just works without any changes on ada 🙃
14:04fdobridge: <karolherbst🐧🦀> at least vkcube runs
14:05fdobridge: <karolherbst🐧🦀> there is a trivial patch to fix the sm value though, but whatever
14:05fdobridge: <karolherbst🐧🦀> apparently hopper (0x180) is SM90 where Ada (0x19x) is SM89
14:10fdobridge: <karolherbst🐧🦀> sooo.. let's run the CTS with my script and see how bad it is
14:24fdobridge: <gfxstrand> Rebased. That was mildly painful... I'm running CTS on the rebase now.
14:24fdobridge: <karolherbst🐧🦀> okay...
14:24fdobridge: <gfxstrand> I'll push once CTS is done
14:25fdobridge: <gfxstrand> Eh, it's not dying in a fire. I'll push now and push again if I have to.
14:25fdobridge: <gfxstrand> There you go. Rebased.
14:25fdobridge: <karolherbst🐧🦀> 😄
14:26fdobridge: <gfxstrand> Alyssa (I think) changed `PIPE_PRIM_*` to `MESA_PRIM_*` and that was a bit annoying.
14:26fdobridge: <gfxstrand> There was also a NIR change
14:26fdobridge: <karolherbst🐧🦀> "gsp: Xid:13 Graphics Exception: SKEDCHECK05_LOCAL_MEMORY_TOTAL_SIZE failed" @airlied
14:27fdobridge: <gfxstrand> ```sh
14:27fdobridge: <gfxstrand> #! /bin/bash
14:27fdobridge: <gfxstrand>
14:27fdobridge: <gfxstrand> OUTDIR="${1}"
14:27fdobridge: <gfxstrand>
14:27fdobridge: <gfxstrand> if ! mkdir "${OUTDIR}"; then
14:27fdobridge: <gfxstrand> echo "${OUTDIR} already exists!"
14:27fdobridge: <gfxstrand> exit 1
14:27fdobridge: <gfxstrand> fi
14:27fdobridge: <gfxstrand>
14:27fdobridge: <gfxstrand> dmesg --follow > "${OUTDIR}/dmesg" &
14:27fdobridge: <gfxstrand> DMESG_PID="$!"
14:27fdobridge: <gfxstrand>
14:27fdobridge: <gfxstrand> export MESA_VK_ABORT_ON_DEVICE_LOSS=1
14:27fdobridge: <gfxstrand>
14:27fdobridge: <gfxstrand> # Disable some codegen optimizations for now
14:27fdobridge: <gfxstrand> export NV50_PROG_OPTIMIZE=1
14:27fdobridge: <gfxstrand>
14:28fdobridge: <gfxstrand> SKIPS=$(cat <<-END
14:28fdobridge: <gfxstrand> dEQP-VK.api.object_management.max.*
14:28fdobridge: <gfxstrand> dEQP-VK.glsl.derivate..*
14:28fdobridge: <gfxstrand> dEQP-VK.graphicsfuzz..*
14:28fdobridge: <gfxstrand> dEQP-VK.image.swapchain_mutable..*
14:28fdobridge: <gfxstrand> dEQP-VK.wsi..*
14:28fdobridge: <gfxstrand> END
14:28fdobridge: <gfxstrand> )
14:28fdobridge: <gfxstrand>
14:28fdobridge: <gfxstrand> PRE_TURING_SKIPS=$(cat <<-END
14:28fdobridge: <gfxstrand> dEQP-VK.query_pool..*copy_result.*
14:28fdobridge: <gfxstrand> .*null_descriptor.*
14:28fdobridge: <gfxstrand> .*cmdcopyquerypoolresults.*
14:28fdobridge: <karolherbst🐧🦀> mhhh
14:28fdobridge: <gfxstrand> I could probably turn derivative tests back on now that helpers work
14:29fdobridge: <karolherbst🐧🦀> ohh.. I figuered out what I messed up locally... oh well, let's run it now
14:34fdobridge: <karolherbst🐧🦀> heh "deqp-vk[20786]: VMM allocation failed: -22"
14:34fdobridge: <gfxstrand> woo?
14:34fdobridge: <gfxstrand> That looks "fun"
14:35fdobridge: <karolherbst🐧🦀> yeah.. no idea, might be some GSP stuff
14:35fdobridge: <karolherbst🐧🦀> anyway.. "19139, Fail: 302, Crash: 13, Warn: 1, Skip: 62479, Flake: 566, Duration: 2:54, Remaining: 1:08:12"
14:35fdobridge: <karolherbst🐧🦀> uhh.. let me see if vkcube still runs
14:36fdobridge: <karolherbst🐧🦀> ahh yeah.. it's using lavapipe now 😄
14:36fdobridge: <gfxstrand> hehe
14:36fdobridge: <gfxstrand> Yeah, those don't look like NVK numbers. 😛
14:36fdobridge: <karolherbst🐧🦀> I think I'm gonna remove those icd files
14:37fdobridge: <gfxstrand> Yeah, when testing I use `VK_ICD_FILENAMES=` to ensure I get exactly one driver and it's the one I want.
14:42fdobridge: <karolherbst🐧🦀> okay.. should be good now
14:48fdobridge: <karolherbst🐧🦀> "Pass: 24817, Fail: 926, Crash: 1, Skip: 145754, Flake: 2, Duration: 3:31, Remaining: 37:50"
14:52fdobridge: <gfxstrand> That's a lot of fail for the first 3 min
14:57fdobridge: <karolherbst🐧🦀> "Pass: 61145, Fail: 2262, Crash: 9, Warn: 1, Skip: 358333, Timeout: 3, Missing: 1593380, Flake: 110, Duration: 9:11, Remaining: 0"
14:57fdobridge: <karolherbst🐧🦀> ehh.. seems like my GPU crashed again 😄
14:57fdobridge: <karolherbst🐧🦀> "Pass: 60767, Fail: 2223, Crash: 8, Warn: 1, Skip: 355493, Timeout: 3, Flake: 5, Duration: 9:05, Remaining: 34:41"
14:58fdobridge: <gfxstrand> Yeah... looks like
14:58fdobridge: <karolherbst🐧🦀> well.. not tooooo bad, but I guess GSP isn't there yet
14:58fdobridge: <karolherbst🐧🦀> I think it makes more sense to run with Ampere and fix the bugs there
14:58fdobridge: <karolherbst🐧🦀> like that `SKEDCHECK05_LOCAL_MEMORY_TOTAL_SIZE` error
14:58fdobridge: <karolherbst🐧🦀> anyway.. Ada == Ampere
15:03fdobridge: <gfxstrand> Okay, rebase run checks out
15:16fdobridge: <esdrastarsis> ben updated the 00.02-gsp-rm branch recently
15:18fdobridge:<gfxstrand> lives dangerously and tries another new uAPI run with a rebased Mesa branch
15:24fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/231 guess more won't be needed...
15:25fdobridge: <karolherbst🐧🦀> @gfxstrand you don't have an Ampere GPU or do you?
15:25fdobridge: <gfxstrand> No, not yet
15:28fdobridge: <gfxstrand> Merged
15:28fdobridge: <gfxstrand> I'll probably get some newer cards once NAK is in decent shape.
15:28fdobridge: <karolherbst🐧🦀> cool
15:28fdobridge: <gfxstrand> And once GSP is in shape such that I can use a lovelace as my daily driver.
15:29fdobridge: <karolherbst🐧🦀> I'll probably figure out that local memory thing, because that's the only error I was seeing on Ada as well..
15:29fdobridge: <gfxstrand> Cool
15:29fdobridge: <gfxstrand> That's probably just a QMD thing
15:29fdobridge: <gfxstrand> Or is it for all stages?
15:29fdobridge: <karolherbst🐧🦀> it's compute only
15:29fdobridge: <karolherbst🐧🦀> there is also another error, but no idea what that is all about:
15:29fdobridge: <karolherbst🐧🦀> [ 1324.344091] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:16 type:45 scope:1 part:233
15:29fdobridge: <karolherbst🐧🦀> [ 1324.344098] nouveau 0000:01:00.0: fifo:c00000:0002:0002:[Xorg[10237]] errored - disabling channel
15:29fdobridge: <gfxstrand> Yeah, probably a bit moved in QMD
15:30fdobridge: <karolherbst🐧🦀> or some weirdo alignment thing or something
15:30fdobridge: <karolherbst🐧🦀> I'll play around with it
15:30fdobridge: <karolherbst🐧🦀> the annoying part with ada is we don't have the compute class header...
15:31fdobridge: <karolherbst🐧🦀> but 3D is 100% identical to Ampere
15:31fdobridge: <karolherbst🐧🦀> one concerning part is dma-copy
15:33fdobridge: <karolherbst🐧🦀> hopper dma-copy: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/class/clc8b5.h
15:33fdobridge: <karolherbst🐧🦀> no header for ada either
15:33fdobridge: <karolherbst🐧🦀> but ada seems to be the same as ampere here as well.. otherwise how would anything work 😄
15:35fdobridge: <karolherbst🐧🦀> Hopper is probably entirely broken, but I'm also not concerned about users running nouveau on hopper
15:38fdobridge: <gfxstrand> I might care eventually
15:38fdobridge: <gfxstrand> But not today
15:38fdobridge: <karolherbst🐧🦀> I think the problem with hopper is that it can't do 3D
15:38fdobridge: <gfxstrand> Sure
15:38fdobridge: <gfxstrand> Vulkan compute, baby!
15:38fdobridge: <gfxstrand> Or rusticl
15:38fdobridge: <karolherbst🐧🦀> just use mesh shaders 😛
15:38fdobridge: <gfxstrand> Or rusticl + zink + NVK
15:38fdobridge: <gfxstrand> Or something
15:39fdobridge: <karolherbst🐧🦀> yeah, but hopper is like.. expensive 😄
15:39fdobridge: <karolherbst🐧🦀> it's really a DC only GPU
15:40fdobridge: <gfxstrand> Yeah, I know
15:40fdobridge: <gfxstrand> Like I said. I might care eventually but not today.
15:41fdobridge: <gfxstrand> If we build it right, what we build for client GPUs should scale to the datacenter. We probably can't beat nvidia at their own CUDA game but we should be able to scale.
15:41fdobridge: <karolherbst🐧🦀> right
16:48fdobridge: <gfxstrand> Wow. Got a second CTS run with the new uAPI to survive. 🤯
17:10fdobridge: <marysaka> Nice :vibrate:
17:20fdobridge: <mohamexiety> does GSP run with NVK now?
17:21fdobridge: <mohamexiety> last time I tried it just failed with something related to sync or so 😮
17:23fdobridge: <mohamexiety> you get two (2) TPCs if you're brave enough hahaha
17:23fdobridge: <mohamexiety> you get two (2) TPCs that can do graphics if you're brave enough hahaha (edited)
17:30HdkR: Puts in to perspective in the latest datacenter keynote that Jensen did. "Can it run Crysis?", Not very efficiently!
17:39fdobridge: <esdrastarsis> yeah, I think the problem with double free on turing using gsp was this function, Ben removed it
18:34fdobridge: <esdrastarsis> Ziggurat (OpenGL native game), 2560x1080 High Quality 60 Fps on Wayland (Sway), my gpu is GTX 1650 (Turing) using Nouveau with GSP reclocking
18:34fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1131292988912455890/20230719_15h30m35s_grim.png
18:34fdobridge: <esdrastarsis> finally, nouveau gaming
18:36fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Try 1080p max settings on SuperTuxKart
18:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://www.youtube.com/watch?v=paeaveMZms0
18:43fdobridge: <esdrastarsis> 30 fps 🐸
18:43fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> A third of proprietary performance 🤔
18:44fdobridge: <esdrastarsis> Codegen memes?
18:45fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Faith has said that NAK generated more optimized instructions than codegen some time ago
19:40fdobridge: <gfxstrand> The nouveau GL driver is also doing some pretty serious nonsense
19:40fdobridge: <gfxstrand> NVK should be doing less nonsense in theory but it's still pretty stall-happy
19:40fdobridge: <ttabi1> @esdrastarsis Ben's latest code works on Turing now.
19:43fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> What nonsense?
19:45fdobridge: <esdrastarsis> Yeah, I'm testing now (see my screenshot above), thanks for letting me know
19:45fdobridge: <ttabi1> Sweet!
19:48fdobridge: <esdrastarsis> Was the nvkm_firmware_put(blob) in goto done the culprit of the double free?
19:50fdobridge: <karolherbst🐧🦀> @ttabi1 while you are here, any idea what's going on here? https://gist.githubusercontent.com/karolherbst/3a6a06e87236f17a7212de10e3700283/raw/4ee983098c8e7413e52d73a9f9c953c8c2fd2d5d/gistfile1.txt
19:50fdobridge: <karolherbst🐧🦀> this happens after running the VK CTS for a while on Ada
19:51fdobridge: <ttabi1> I didn't check the code to see what changed yet.
19:51fdobridge: <ttabi1> Hmmm could be anything.
19:51fdobridge: <ttabi1> You'd have to ask Ben.
19:52fdobridge: <ttabi1> GSP-RM is such a beast that I never just "know" what's going on, I have to debug it.
19:52fdobridge: <karolherbst🐧🦀> fair enough
19:53fdobridge: <karolherbst🐧🦀> kinda feels like GSP crashes or something, but ....
19:53fdobridge: <ttabi1> GSP-RM crashes usually appears as a timeout sending RPCs
19:54fdobridge: <karolherbst🐧🦀> mhh...
19:54fdobridge: <ttabi1> I have plans on adding a whole bunch of error handling/logging stuff once Ben's code is upstream.
20:00fdobridge: <karolherbst🐧🦀> that would be very helpful!
20:12fdobridge: <karolherbst🐧🦀> @gfxstrand so uhm.. ada doens't like shaders with 0 gprs 😄
20:13fdobridge: <karolherbst🐧🦀> I have to figure out what's going on there, but we might not want to set gprs to 0 and have it be 4 at the minimum or something...
20:28fdobridge: <mohamexiety> I guess ampere behaves differently? this is interesting cuz it's mostly the same SM 😮
20:29fdobridge: <airlied> @gfxstrand any big regressions stand out on uapi?
20:35fdobridge: <karolherbst🐧🦀> no idea
20:43fdobridge: <karolherbst🐧🦀> @gfxstrand what's the proper way of dumping shaders with nvk?
20:43fdobridge: <karolherbst🐧🦀> but anyway.. the slm buffer is too small on ampere/ada
20:43fdobridge: <karolherbst🐧🦀> now figuring out why that is
20:48fdobridge: <airlied> Is the dump same as for gl?
20:49fdobridge: <karolherbst🐧🦀> mhh.. sooo... in one of those tests the per thread local memory is 0x420 and the global slm buffer is 0x4e60000, but that's an invalid combination
20:49fdobridge: <karolherbst🐧🦀> it appears that a global slm buffer of "0x4e60000" can support up to 0x2c0 per thread local memory
20:51fdobridge: <karolherbst🐧🦀> mp count is 38
20:52fdobridge: <karolherbst🐧🦀> a bit too much of a difference to be a simple alignment problem...
20:54fdobridge: <karolherbst🐧🦀> the calculate size for 0x2c0 per thread would be 0x0x3440000
20:54fdobridge: <karolherbst🐧🦀> the calculate size for 0x2c0 per thread would be 0x3440000 (edited)
20:54fdobridge: <karolherbst🐧🦀> ignoring alignment, we multiple the per thread one by 32 * 64 * mp_count
20:54fdobridge: <karolherbst🐧🦀> let's see if the cuda hw support feature shows anything obvious we are missing since ampere
20:55fdobridge: <karolherbst🐧🦀> ahh yeah...
20:57fdobridge: <karolherbst🐧🦀> mhhh strange
20:58fdobridge: <gfxstrand> Not that I'm seeing. Doing a bit of refactoring of the code right now.
20:58fdobridge: <gfxstrand> Refactoring is the best form of review. 😁
21:02fdobridge: <karolherbst🐧🦀> yeah...
21:02fdobridge: <karolherbst🐧🦀> it's a factor of 1.5 indeed
21:02fdobridge: <karolherbst🐧🦀> the heck
21:02fdobridge: <karolherbst🐧🦀> a warp is still 32 threads, because uhm... 48 would be weird
21:02fdobridge: <karolherbst🐧🦀> and we still only have 64 warps per mp
21:03fdobridge: <karolherbst🐧🦀> maybe the mp count is wrong...
21:04fdobridge: <karolherbst🐧🦀> but 38*1.5 would be 57.. mhh.. maybe it's more like 58 or something.. let's see how I can verify this
21:07fdobridge: <karolherbst🐧🦀> does nvidia report how many mps a GPU has?
21:14fdobridge: <mohamexiety> mp?
21:15fdobridge: <mohamexiety> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications think this is the most they reveal out of device limits
21:16fdobridge: <mohamexiety> table 15
21:16fdobridge: <mhenning> "The combined capacity of the L1 data cache and shared memory is 192 KB/SM in A100 versus 128 KB/SM in V100." <- a 1.5x difference (from https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf )
21:17fdobridge: <karolherbst🐧🦀> local memory
21:17fdobridge: <karolherbst🐧🦀> not shared
21:17fdobridge: <karolherbst🐧🦀> local memory is a buffer located in VRAM
21:18fdobridge: <mohamexiety> if mp = sm then 48 is the max number of warps per SM for consumer ampere / ada, and 64 is the max number for pro Ampere (A100)
21:21fdobridge: <mhenning> oops, I guess I misinterpreted what slm stands for then
21:46fdobridge: <karolherbst🐧🦀> OKAY
21:47fdobridge: <karolherbst🐧🦀> it's a factor of 2 missing
21:47fdobridge: <karolherbst🐧🦀> I don't have 64 warps per SM, I only have 48
21:47fdobridge: <karolherbst🐧🦀> I asked Ben to expose that information via the device info IOCTL thing
21:47fdobridge: <karolherbst🐧🦀> because atm we don't
21:48fdobridge: <karolherbst🐧🦀> or is that the `gpc_count`? let me check...
21:49fdobridge: <mohamexiety> yeah only pro ampere does 64 interestingly
21:55fdobridge: <karolherbst🐧🦀> yeah.. so since ampere there are two SMs
22:00fdobridge: <karolherbst🐧🦀> huh...
22:01fdobridge: <karolherbst🐧🦀> yeah...
22:01fdobridge: <karolherbst🐧🦀> there is a factor of 2 missing
22:10fdobridge: <karolherbst🐧🦀> maybe the mmio thing is different there?
22:11fdobridge: <karolherbst🐧🦀> but.. strange...
22:11fdobridge: <karolherbst🐧🦀> https://cdn.discordapp.com/attachments/1034184951790305330/1131347669097402430/gp100_block_diagram-1.png
22:11fdobridge: <karolherbst🐧🦀> ehh.. wrong chat 😄
22:17fdobridge: <karolherbst🐧🦀> @gfxstrand mhhh.. do you see any local memory related bugs running the CTS?
22:17fdobridge: <karolherbst🐧🦀> because... we sure are calculating that shit incorrectly
22:17fdobridge: <karolherbst🐧🦀> sooo.. the nouveau ioctl gives us the _tpc_ count, not _mp_count_
22:18fdobridge: <karolherbst🐧🦀> what's the different?
22:18fdobridge: <karolherbst🐧🦀> on some gens, a TPC has one SM/MP
22:18fdobridge: <karolherbst🐧🦀> on others a TPC has two SM/MPs
22:18fdobridge: <karolherbst🐧🦀> odd part
22:18fdobridge: <karolherbst🐧🦀> GP100 has 2 per TPC, GP102+ has 1, TU102+ has two again
22:18fdobridge: <karolherbst🐧🦀> but Ben is also not sure if the reported values are all sane or not
22:25fdobridge: <karolherbst🐧🦀> @airlied if you want to run the CTS on ampere again, you need to double the slm buffer size inside `nvk_slm_area_ensure` at `uint64_t size = bytes_per_mp * dev->pdev->dev->mp_count;`. Just stick a `* 2` in there and it should just work (tm)
22:25fdobridge: <karolherbst🐧🦀> I'm seeing faults, but that might be the sahder doing dumb shit
22:25fdobridge: <karolherbst🐧🦀> anyway.. I'll clean that mess up because it's slightly wrong 😄
22:35fdobridge: <gfxstrand> When I hooked stuff up for NAK, it looked a bit sketchy. I didn't dig in, though.
22:35fdobridge: <gfxstrand> @airlied One thing I think we're still missing from the new uAPI is a new submit ioctl which just takes an array of unlimited length of virtual addresses.
22:36fdobridge: <gfxstrand> Or it can be limited length and we can ioctl multiple times.
22:36fdobridge: <airlied> huh the exec ioctl is the new submit
22:36fdobridge: <gfxstrand> It just makes syncobj wrangling easier if it's unlimited
22:36fdobridge: <gfxstrand> Hrm... I haven't found that in the MR yet
22:36fdobridge: <airlied> it takes pushes which are vaddr/length
22:37fdobridge: <karolherbst🐧🦀> mhh.. though the per thread value should be fine, it's really just that we have to rename `mp_count` to `tpc_count` and double it on some architectures. But there is also this constant 64 warps per `mp` which is less on a couple of GPUs... anyway, I'm kinda trying to figure out what needs to be fixed on what generation of GPU
22:37fdobridge: <airlied> struct drm_nouveau_exec
22:37fdobridge: <airlied> okay it's limited to 32-bits of exec ptrs
22:37fdobridge: <gfxstrand> Hrm... Maybe it's hidden in this queue commit
22:37fdobridge: <karolherbst🐧🦀> maybe it all works out in some shaders which don't use a ton of TLS space due to the alignment thingies...
22:37fdobridge: <gfxstrand> Yeah, 32 bits should be enough. 😅
22:38fdobridge: <airlied> but yheah exec takes 3 counts, 1 for sync wait, 1 or sync signals and one for pushes
22:39fdobridge: <gfxstrand> cool
22:46fdobridge: <gfxstrand> I'm going to push the first 4 patches in the MR If this current run goes okay
22:46fdobridge: <gfxstrand> It's almost 6:00 PM here so I think I'm probably done for the evening. I'll get to exec tomorrow.
22:46fdobridge: <airlied> cool, I'll look over some cleanups when I get out my next meeting
22:46fdobridge: <gfxstrand> So consider the lock on the MR branch released if you want to do any bugfixing
22:47fdobridge: <gfxstrand> I've not been bothering to make fixup commits. I just `git commit --amend`
22:48fdobridge: <airlied> I don't quite get your comment on binding over with sparse, but I'll try and figure out what you mean 🙂
22:50fdobridge: <gfxstrand> I mean that on `nvk_DestroyImage()` or `nvk_DestroyBuffer()`, we should just free the VMA range. Right now, we're doing a sparse bind over it.
22:51fdobridge: <gfxstrand> It still releases any bound BOs so it's not like we're going to start leaking memory but it leaves the memory range bound to whatever the sparse null page thingy looks like.
22:53fdobridge: <airlied> we shouldn't be, it should be just unbind and free
22:53fdobridge: <airlied> if it was a sparse mapping we unbind it
22:56fdobridge: <airlied> sparse buffers get created, gets a sparse mapping, when it's destroy, we destroy that mapping
23:13fdobridge: <esdrastarsis> Wolfenstein: The New Order on low settings, nice
23:13fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1131363198138859541/20230719_20h06m44s_grim.png
23:17fdobridge: <gfxstrand> Hrm... Maybe I misread
23:18fdobridge: <gfxstrand> @airlied I guess I don't get what the difference is between unbind and unbind with `VM_BIND_SPARSE`
23:22fdobridge: <airlied> ah so if we unbind the sparse to a normal bind and then unbind that it's kinda pointless
23:24fdobridge: <airlied> btw we have ran this on pascal in the past and the basics did seem to work
23:42fdobridge: <gfxstrand> I guess I don't get eggs l why there's a difference between a sparse and non-sparse unbind.
23:44fdobridge: <gfxstrand> I expected it to work like mmap where bunds just overwrite what's there. A sparse bind just sets the soft fault bit on the given range. A non-sparse bind fills it with pages from the BO, and an unbind removes whatever's there and leaves it in a full fault state.
23:45fdobridge: <gfxstrand> I expected it to work like mmap where binds just overwrite what's there. A sparse bind just sets the soft fault bit on the given range. A non-sparse bind fills it with pages from the BO, and an unbind removes whatever's there and leaves it in a full fault state. (edited)
23:49fdobridge: <airlied> actually it's likely I can just drop one of those paths in the userspace, thinks its just an after effect from previous iterations
23:49fdobridge: <airlied> we shouldn't be calling the normal unbind_vma in the sparse path
23:58fdobridge: <gfxstrand> Why not? I'm still confused as to what a sparse unbind is at all. 🤷🏻♀️
23:58fdobridge: <gfxstrand> What does that even mean? How is it different from a regular unbind?