00:00x512[m]: I thought they added dynamically allocable ring buffers.
00:01airlied[d]: I think the latest AMD GPUs has two gfx queues
00:01airlied[d]: and you can configure them as a global or user submit, but since we have a driver model that needs one global for legacy reasons, the other can be used for user submits
00:06cubanismo[d]: FWIW, if anyone ever gets to the point of writing down the userspace submission ideas they have, I'd be interested in being CC'd
00:06x512[m]: Are there some simple Vulkan GPU hang demo with source code?
00:07cubanismo[d]: The proprietary driver does userspace submission that interops with dma-fence, which I mostly wrote. Most of it is in the open source code if anyone wants to go inspect it. There are indeed parts that might be considered... creative.
00:07airlied[d]: not aware of a hang demo, but I haven't really looked
00:08gfxstrand[d]: airlied[d]: This
00:09cubanismo[d]: I don't think I did anything particularly clever, so I'd be curious whether there are interesting corner cases I didn't consider
00:09gfxstrand[d]: Microsoft did it by switching their entire synchronization model in one atomic motion with Windows 8. They also banned all hardware that can't preempt at the same time as well as made very specific window system choices, all to allow for that change. Linux doesn't have that luxury.
00:10airlied[d]: I'd love to have the power to tell everyone they need to write new drivers 😛
00:10cubanismo[d]: gfxstrand[d]: And they still had some pretty nasty bugs if userspace used old primitives on the new thing
00:12x512[m]: Haiku has that luxure
00:12x512[m]: :)
00:12gfxstrand[d]: airlied[d]: The DRM subsystem will be switching to rust by 2030. Someone go tell Phoronix.
00:14airlied[d]: haiku has the luxury of not supporting intel and amd gpus if it wants 😛
00:15x512[m]: gfxstrand[d]: Maybe use different driver model for old and new GPUs in Linux? To allow users to have benefits when using new hardware.
00:15gfxstrand[d]: That works right up until someone plugs in a USB stick and expects to display from their Intel card on it.
00:16airlied[d]: or even just puts a new gpu in an old intel motherboar
00:16gfxstrand[d]: Or plugs an Nvidia card into a motherboard with a pre-xe Intel card.
00:16gfxstrand[d]: Or an old card into a new AMD motherboard
00:16x512[m]: GPU - CPU - GPU sync can always work.
00:17gfxstrand[d]: Though at least in those cases we could switch based on hardware config as long as we don't care about eGPUs
00:18gfxstrand[d]: x512[m]: Yes and that's what we'd have to make work, but completely transparent to userspace because we also have decades of compositors to keep working.
00:19x512[m]: Open source NVRM is really a treasure for alternative OSes.
00:23x512[m]: cubanismo[d]: Am I understanding right that current proprietary Nvidia driver wake up ALL userland clients when some GPU job is done?
00:28gfxstrand[d]: Define "wake up"? Yes, if you have more userspace clients than GSP contexts and/or you're out of memory and have stuff swapped to disk or some place similarly inaccessible to the GPU then, yes, you'd have to cycle through contexts at some point. But as long as everything fits within available system resources, the GSP can easily cycle contexts and see who's still waiting.
00:29x512[m]: All userland pull() calls will return when GPU job is done.
00:30x512[m]: Not only client that submitted GPU job.
00:31x512[m]: NV01_EVENT object with userland FD.
00:36x512[m]: According to my open NVRM code analysis and test code, kernel driver just broadcast wakeup to all userland clients and do not try to check context/channel etc..
01:03gfxstrand[d]: Yeah. You kinda have to. With shared fences, it's not easy to know what userspace to wake up so you kinda have to wake up everybody. You can move that wakeup into the kernel so userspace never notices if it's for someone else, but that's the best you can do.
01:16x512[m]: In theory NV01_EVENT may remember to witch channel it belongs to.
01:24gfxstrand[d]: Sure but that doesn't mean the context that signaled the interrupt and the context waiting are the same.
03:29cubanismo[d]: @x512[m] Not since I fixed that. See https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/mem_mgr/sem_surf.c
03:30x512[m]: Yes, but sem_serf seems not used by proprietary Nvidia Vulkan driver.
06:26kar1m0[d]: mohamexiety[d]: It only works with 2 laptops and Ubuntu. Also there isn't any code?
07:16notthatclippy[d]: airlied[d]: Just occurred to me that user- and kernel-space submissions can have the same API. If userspace submission is not allowed, trigger a pagefault on that address and ensure the relevant memory is resident before kicking off the work. The architecture/API doesn't assume one thing or the other, but if kernel thinks that no forward progress is made due to 51% issue, it can just block
07:16notthatclippy[d]: userspace submissions for some or all contexts.
07:29mohamexiety[d]: kar1m0[d]: Yeah looks like they didn’t upload it yet. I don’t really know any other way tbh
07:30mohamexiety[d]: The driver can’t really do much here since the exposed PL is configured by the OEM
09:25jespernystrom: Anything can be indexed when the condition holds that power representatives and their answer sets are contiguous and invariant to a certain degree, since packed sequence is invariant for upuntil 5 powers , we chunk it to 4 powers times 8 and 16 for 32 and 64bits respectively. Hence to make those subanswers in groups of 4 invariant , you index them and after that you make sure that the
09:25jespernystrom: next fields 1st element + base is larger than all the previous elements powers. And after that you arrive to heaven. Code starts to ripple itself.
09:36jespernystrom: So that paradigm is real that i posted to you, just i am head over troubles and programming one of the most sophisticated plumbing in the world today,i am lacking resources to troll around nowdays already, but my lifes work has no mistakes in it, i believe so according to tests, but there is the whole compiler still to go, and giant amount of details that are solved to implement yet.
09:59jespernystrom: So to illustrate this procedure 5+9+13+17=44 now 21+25+29+33=108 21+24 is the densest modification now the first element + base is higher than the last 4 combined, so 108+24 is the new sum. You can find pleanty of symmetrical ones or force their distance long enough and stuff the needed answers to ripple from. so the 3rd quadrant would be 37+41+45+49=172 37+132(108+24) would be the
09:59jespernystrom: smallest addition of delta to be totally safe on collisions.
11:35karolherbst[d]: gfxstrand[d]: mhenning[d] does any of you have some time to review a couple of MRs? !36514 !36528 and !36536 are relatively trivial, but would also be nice to get reviews on the other MRs
13:13kar1m0[d]: https://www.nvidia.com/en-in/drivers/details/252613/
13:14kar1m0[d]: 580 drivers released
13:52mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1404824904276250714/message.txt?ex=689c98a5&is=689b4725&hm=e727e8751b968c1ac0d9428876c74735a9d5eb4e28a6b229953473c903ca5eb7&
13:52mohamexiety[d]: having a bit of a weird issue; trying to have nvk and prop nv installed at the same time (with different kernels) on an arch system and it looks like on the nvk side, it cant find any physical devices. is there a way to figure out why/what's going wrong?
13:53mohamexiety[d]: this is on mesa 25.2 with blackwell support on a blackwell only system. the package does have the blackwell enablement patch
13:56snowycoder[d]: Is the nouveau driver in use for the GPU? Sometimes even if you don't load nvidia-prop the blacklist would mess it up.
13:56mohamexiety[d]: yeah it is
13:56mohamexiety[d]: in dmesg nouveau picks up the GPU and all is fine on the kernel side
13:58mohamexiety[d]: [ 5.141056] nouveau 0000:01:00.0: NVIDIA GB202 (1b2000a1)
13:58mohamexiety[d]: [ 5.189236] nouveau 0000:01:00.0: gsp: RM version: 570.144
13:58mohamexiety[d]: [ 5.189429] nouveau 0000:01:00.0: vgaarb: deactivate vga console
13:58mohamexiety[d]: [ 6.302749] nouveau 0000:01:00.0: drm: VRAM: 32607 MiB
13:58mohamexiety[d]: [ 6.302750] nouveau 0000:01:00.0: drm: GART: 0 MiB
13:58mohamexiety[d]: [ 6.554972] nouveau 0000:01:00.0: drm: MM: using COPY for buffer copies
13:58mohamexiety[d]: [ 6.594207] nouveau 0000:01:00.0: [drm] Registered 4 planes with drm panic
13:58mohamexiety[d]: [ 6.594208] [drm] Initialized nouveau 1.4.0 for 0000:01:00.0 on minor 0
13:58mohamexiety[d]: [ 6.627209] fbcon: nouveaudrmfb (fb0) is primary device
13:58mohamexiety[d]: [ 6.627211] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
13:58mohamexiety[d]: [ 7.546827] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops nv50_audio_component_bind_ops [nouveau])
13:58snowycoder[d]: Weird, I never had that happen
14:01mohamexiety[d]: I wonder if the DRI3 support spam has anything to do with it but I think that should work
14:01mohamexiety[d]: like we have that, right?
14:10ermine1716[d]: What's that MetaMode Attribute?
14:16mohamexiety[d]: MetaMode attribute?
14:24chikuwad[d]: > [ 6.554972] nouveau 0000:01:00.0: drm: MM: using COPY for buffer copies
14:24chikuwad[d]: I guess
14:40kar1m0[d]: mohamexiety[d]: Never knew you can have two drivers installed on different kernels on the same system
14:42mohamexiety[d]: It’s just different modules at the end of the day and really usually just works™ (just clearly not in this case :KEKW:) you can even have it on the same kernel just with you blacklisting the driver you don’t want on boot but I couldn’t get that to work on arch
14:43kar1m0[d]: Meh I prefer not to
14:43kar1m0[d]: Because drivers conflict
14:44mohamexiety[d]: They don’t get loaded at the same time — it’s one or the other (tho the NV driver does have some shenanigans with hijacking the loader so you might have to deal with that but..)
14:44snowycoder[d]: kar1m0[d]: The arch wiki mentions how to do it, the biggest problem is to keep the thing stable, nvidia installs a nouveau blacklist and if you remove it by hand it will be overwritten by the next upgrade.
14:45kar1m0[d]: snowycoder[d]: I don't use arch so idk
15:11mohamexiety[d]: mohamexiety[d]: nvm this is just the 6.16rc7 regression
15:11mohamexiety[d]: it works fine
15:59cubanismo[d]: I actually have proprietary and nouveau on the same kernel. I just toggle between them in a modprobe conf file, and using NVK vs. proprietary userspace was working fine here in latest Mesa from git last week or so.
16:00cubanismo[d]: But yeah, had to build very very latest kernel to get it to load at all on GB20x
16:01cubanismo[d]: notthatclippy[d]: How does the kernel determine whether progress is being made? Not familiar with 51% issue.
16:02mohamexiety[d]: cubanismo[d]: yeah I am doing the same right now. it's just arch needed some extra playing around with conf files to prevent VK loader issues (my other system is fedora which worked without that). the other half of my issues was a regression specific to 6.16rc7 (which was the other kernel I was running without the prop driver) that broke nvk pdev init for Ada and Blackwell specifically :KEKW:
16:02cubanismo[d]: Makes sense.
16:02mohamexiety[d]: but now on the same 6.16 release kernel for both and it all works fine
16:05cubanismo[d]: I did run into another unfortunate recursive init issue with the proprietary userspace + NVK + Zink. Zink was initializing Vulkan while holding some lock, which enumerated all Vulkan devices, which caused the proprietary NV driver to try to enumerate any Vulkan devices it could find (It would have found none), which started by initializing EGL (this sucks, we know it sucks), which tried to
16:05cubanismo[d]: enumerate all EGL devices on the system, which tried to initialize Zink.... which blocked waiting for the non-recursive init mutex Zink was holding from the first part.
16:05cubanismo[d]: So I had to temporarily move the proprietary EGL driver json file out of the way.
16:06mohamexiety[d]: cubanismo[d]: yeahhh that's a known issue and I also see it only on arch interestingly. chikuwad[d] wrote up a way to fix it up here:
16:06mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13436#note_2983601
16:06notthatclippy[d]: cubanismo[d]: 51% issue tl;dr: Two processes both have >51% of VRAM allocated (overcommit ftw!) that needs to be resident to make progress on the submission. With userspace submission, they can both submit stuff and end theoretically up in a live-lock kind of thing where each one triggers a pagefault that ends up swapping out some memory needed by the other. With kernel submissions, the kernel can
16:06notthatclippy[d]: bless one and make all its allocations resident first.
16:06notthatclippy[d]: Our UMD stack doesn't have this because we don't overcommit. There are ways to solve it with overcommit too, but it can get hairy.
16:06cubanismo[d]: This is different than the one Faith reported for some Xwayland or GLX init path earlier IIRC, but of a similar nature.
16:08cubanismo[d]: Ah
16:08cubanismo[d]: I always just figured that would be handled with runlist management.
16:09cubanismo[d]: I don't really know what happens at the low levels of kickoff though if a channel isn't scheduled an all that. My understanding is a little higher level.
16:10cubanismo[d]: Regardless, there are going to be situations in which you need to block submission, so a faulting mechanism will probably be needed either way
16:11notthatclippy[d]: runlist management is currently on GSP, and GSP has no knowledge of what vram allocations have been swapped out. It will simply run a scheduled channel until it pagefaults, then wait it out until it gets scheduled again. This needs to be handled higher up.
16:12notthatclippy[d]: You _could_ add it into the whole runlist mgmt logic if you lift that up and implement it in Nova instead of GSP. It's not an unreasonable thing. Or in the "AMP".
16:12cubanismo[d]: Yeah, I assumed we'd have to have a way to influence runlists from the kernel.
16:13cubanismo[d]: Or if AMP refers to what I think it does, then yes, that.
16:13mohamexiety[d]: AI Management Processor™
16:13cubanismo[d]: Who knows what marketing will name things
16:13mohamexiety[d]: :KEKW:
16:13mohamexiety[d]: I am not even joking, that's its name in the blackwell whitepaper
16:13cubanismo[d]: Then yeah, sure.
16:14cubanismo[d]: As I say, I don't even know what the boards are called, let alone what marketing calls things on the chips.
16:15cubanismo[d]: There was a running joke about Ultrashadow back in the day, in that it was an extremely misleading name even though it somewhat accurately described the benefit of whatever hardware it supposedly mapped to.
16:15cubanismo[d]: I forget the details though.
16:16notthatclippy[d]: mohamexiety[d]: Yep. And that's exactly where I learned that name. It'll be CMC in OpenRM sources.
16:16cubanismo[d]: Yeah, that.
16:17cubanismo[d]: If Nova is Blackwell or higher plus, I assume we could rely on that.
16:17mohamexiety[d]: Turing+
16:18cubanismo[d]: But as I say, my understanding is pretty high level. I jad always hoped I could write some of this code, but my career and company priorities sort of went a different direction.
16:18notthatclippy[d]: We're messing around with changing up the GSP interface for this stuff quite drastically, but it's all very experimental now and not really for discussion here. I can send you some reading material per email if you want to get involved.
16:18cubanismo[d]: Still feel like I'm missing all the fun though
16:19cubanismo[d]: Yeah, I'm aware of that project.
16:19cubanismo[d]: I just never have time to do anything.
16:19cubanismo[d]: I wouldn't want to slow it down by pretending I do.
16:24snowycoder[d]: I have a new dEQP test failing on Kepler, but I don't know enough about the kernel for this.
16:24snowycoder[d]: Anyone familiar with nouveau job timeouts?
16:25snowycoder[d]: In summary (from what I understand):
16:25snowycoder[d]: - `dEQP-VK.tessellation.misc_draw.tess_factor_barrier_bug` submits a shader
16:25snowycoder[d]: - The kernel kills the channel (timeout)
16:25snowycoder[d]: - The shader is still somehow running and tries to read/write unallocated memory
16:25snowycoder[d]: - Userspace sees DeviceLost
16:26mhenning[d]: The timeout always kills the context and results in DeviceLost
16:27cubanismo[d]: Yeah, seems like the real problem would be the timeout, which is actually probably some issue with the shader from userspace.
16:27snowycoder[d]: So the "Timeout" test result in dEQP runs in when there are multiple invocations?
16:27snowycoder[d]: How do can we solve this if it's in the comformance test
16:28mhenning[d]: You could try running with NVK_DEBUG=push_sync which will point out which submit is timing out (assuming the bug still happens with the additional synchronization)
16:29mhenning[d]: snowycoder[d]: Not sure what you mean. dEQP's timeouts are different from the kernel timeouts
16:30mhenning[d]: dEQP gives up on a test if it takes more than a few minutes, sometimes it makes sense to just increase that timeout
16:42snowycoder[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1404867562604466226/output.txt?ex=689cc060&is=689b6ee0&hm=c31a7dfee4626b8c7cb686e238482e5e6ee76433ff9a102c08ee394b7dd694a1&
16:42snowycoder[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1404867563179081869/tess_factor_barrier_bug.amber?ex=689cc060&is=689b6ee0&hm=9a7cba21b4ec39af09a6dad151d9d5b26aad75ad6f92bdb2b24a7419a468f6d7&
16:42snowycoder[d]: I don't know much about submit buffers (I'd love to learn more if I could find resources).
16:42snowycoder[d]: To me it seems like it just sets up the pipeline and kicks it off.
16:43snowycoder[d]: I simplified the original testcase to rule out the original bug (now the tessellation control shader is as simple as possible)
16:47snowycoder[d]: This test will always fail, what I want to avoid is DeviceLost
16:47snowycoder[d]: It seems like enabling the tessellation shaders slows down the vertex shader.
17:29ermine1716[d]: mohamexiety[d]: > Added an "OutputBitsPerComponent" MetaMode attribute that can be used to control the number of bits per color component transmitted via a display connector. If not specified, the driver will choose an optimal color format.
17:29ermine1716[d]: As per phoronix's article about driver release
17:46mohamexiety[d]: Oh it’s a way to specify the bits per color component
17:46gfxstrand[d]: snowycoder[d]: A timeout will turn into device lost, even without the fault.
17:46mohamexiety[d]: So for example if you’re bandwidth constrained due to cable or such and you have to choose between 4k 60Hz at 8bpp or 4k 120Hz at 6bpp you can specify what you want
17:47mohamexiety[d]: If you don’t the driver does
17:47snowycoder[d]: gfxstrand[d]: so, if the dEQP tests have a very slow shader that triggers a timeout, what should we do?
17:48gfxstrand[d]: Hack the kernel to increase the timeout so the test passes and we can submit CTS. 😂
18:03kar1m0[d]: imagine if drivers just worked by you telling them what to do in human language and not code
18:04chikuwad[d]: kar1m0[d]: me when the driver crashes:
18:05chikuwad[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1404888463651897445/eatshitanddie.mp3?ex=689cd3d7&is=689b8257&hm=5a99dbb5c0b39cb3ff01606bd61ac08c666fd3b2d3f46b62eacf82ba4a0ddb25&
18:22snowycoder[d]: gfxstrand[d]: The timeout is already 10s, pushing it further is a problem if jobs aren't preemptible.
18:22snowycoder[d]: We would need to... create a custom kernel patch just for CTS runs?
18:22mohamexiety[d]: We already did this for Maxwell conformance iirc?
18:23mohamexiety[d]: Either that or Kepler actually
18:23x512[m]: gfxstrand[d]: Timeout is hardcoded and not configurable?
18:24snowycoder[d]: x512[m]: There's a `NOUVEAU_SCHED_JOB_TIMEOUT_MS` in nouveau kernel module.
18:24snowycoder[d]: But it could be a good idea to add a kernel parameter to change it.
18:25snowycoder[d]: mohamexiety[d]: Oh, I did not know that, sorry. It just seems a bit like cheating?
18:26mohamexiety[d]: Well it’s not like there were other options; without reclocking some tests need a longer timeout as things are super slow
18:29snowycoder[d]: The really weird thing is that we slow operation is in the vertex shader, but running the vertex shader alone works fine.
18:29snowycoder[d]: Somehow enabling tessellation limits vertex shader parallelism
18:31marysaka[d]: snowycoder[d]: how many attributes do you have active
18:32marysaka[d]: they need to share the ISBE space so it's maybe related
18:34gfxstrand[d]: snowycoder[d]: That's what I did to pass conformance and submit
18:34gfxstrand[d]: x512[m]: Yes. It's annoying.
18:34gfxstrand[d]: snowycoder[d]: That's not entirely surprising. Tessellation tends to serialize a lot of stuff.
18:35gfxstrand[d]: There are reasons why everyone hates that feature
18:35gfxstrand[d]: I think more recent NVIDIA is better but TS and GS suck in general
18:36snowycoder[d]: marysaka[d]: In tess-control, 4 flot out + 4 float in (hoping that is the rappresentation in ISBE mem), not that many.
18:37HdkR: Tess is always garbage, it's just less garbage on NV than other vendors :D
18:38x512[m]: Really? No benefit to do tesselation with GPU comparing to CPU?
18:39snowycoder[d]: gfxstrand[d]: What do you think about a module-parameter?
18:39snowycoder[d]: It could also be used as an opt-in way to limit freezes in normal use, (10s for non-preemptible jobs is a bit too much).
18:40mohamexiety[d]: tess and transform feedback, most well beloved gpu features
18:40gfxstrand[d]: snowycoder[d]: I wouldn't be opposed. It'd be really convenient for CTS runs and the like.
19:28gfxstrand[d]: snowycoder[d]: Did you have a kernel patch for me to test on Maxwell A?
19:29JukkaMattivps: Of course our police never charges me, and has never done so, they all know those absurd releasing scammer clans, it's their everyday job. And i am was and will be their only hope when things go out of hands in conflicts, and that despite my injuries cracked at me decades ago, cause uncoordinated illborn people with grand delusions can't be sent to centuries most important turnover of our
19:29JukkaMattivps: whole nation. But i have still troubles, i can not block the intruders that fast, it requires work under surveillance and i need some guards and technology for that, things do not work out well if i release the code to my and our own enemies total incapable dickheads to own us. In other words, i need real job and status to work for national security or even international security, in some
19:29JukkaMattivps: terms david airlie is correct, and he has been correct on quite matters, i am referring to such code never belonged to linux, neither haiku too. They only gather such code as i practiced doing to Army/military. It's not the question at all that i am incapable of doing drivers or totally killer machine code, it's that i never wanted to get it travel to wrong hands and used against me,
19:29JukkaMattivps: which is where i do not code at all before the offer comes to work for national security. And not too much of the one sided storie comments or facebook javver i pay attention to, it's undoudedly hugest scamming platform for deluded people not an ultimate ethernal truth engine. So everybody already knows that i fork to army works and i can not commit that at home, i am under a huge siege
19:29JukkaMattivps: and tap from wrong side.
19:30snowycoder[d]: gfxstrand[d]: Even just running `dEQP-VK.tessellation.geometry_interaction.passthrough.tessellate_triangles_passthrough_geometry_no_change` deqp testcase and checking kernel logs would help (to confirm the bug is present there too)
19:32snowycoder[d]: If you have time to run a patch and check if it works it would be wonderful, although it's so simple that it should just work (I hope)
19:32gfxstrand[d]: Yeah, I can do that pretty easy
19:32gfxstrand[d]: I just plugged in a GM107
19:34gfxstrand[d]: Test passes here
19:34snowycoder[d]: Is there any log in dmesg?
19:34gfxstrand[d]: nope
19:34snowycoder[d]: I don't know how that should work but I'm not complaining 😂
19:34snowycoder[d]: Thank you!
19:39gfxstrand[d]: I'm gonna swap in a Kepler and poke at it.
19:39snowycoder[d]: gfxstrand[d]: For context: I thought that there could be the same bug that is present on Kepler since bit 14 of gr_gpcs_tpcs_sms_hww_warp_esr_report_mask is set and we can't reset it in userspace with `NVK_MME_SET_PRIV_REG` because of the open firmware.
19:40gfxstrand[d]: Looks like we have a Kepler bug
19:41gfxstrand[d]: I'm sure there are CTS tests which spam OOB_ADDR. I just don't know what they are.
19:41gfxstrand[d]: *Vulkan CTS
19:42gfxstrand[d]: I can't get the GL CTS to run on Zink+NVK+Maxwell. IDK why. EGL weirdness.
19:45snowycoder[d]: gfxstrand[d]: I have found some, they seem to be the same bug, try those:
19:45snowycoder[d]: - `dEQP-VK.query_pool.statistics_query.input_assembly_primitives.primary.32bits_cmdcopyquerypoolresults_patch_list_v4_p2_with_no_color_attachments`
19:45snowycoder[d]: - `dEQP-VK.tessellation.geometry_interaction.passthrough.tessellate_triangles_passthrough_geometry_no_change`
19:47snowycoder[d]: In my run (KeplerA) there are a lot of `[MISALIGNED_ADDR]` spam, I still need to check those
19:49snowycoder[d]: gfxstrand[d]: Can you check comments on MR !36393 while you are around Kepler stuff?
19:50gfxstrand[d]: Oops: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36745
19:51gfxstrand[d]: snowycoder[d]: dEQP-VK.tessellation.geometry_interaction.passthrough.tessellate_triangles_passthrough_geometry_no_change passes almost instantly for me on Kepler A but throws OOR_ADDR
19:52snowycoder[d]: Yep, it does that on my GPUs too
19:52snowycoder[d]: While `dEQP-VK.tessellation.misc_draw.tess_factor_barrier_bug` thinks for 10s and gives up
19:54snowycoder[d]: karolherbst[d]: That patch should fix the OOR_ADDR warnings
19:54gfxstrand[d]: Weird that it doesn't OOR_ADDR on Maxwell
19:55karolherbst[d]: might be disabled already
19:55gfxstrand[d]: I wonder if it errors on something recent if I re-enable the exception
19:55karolherbst[d]: mhhh.. nouveau sets it to `0x00d3eff2` on maxwell1 as well..
19:56gfxstrand[d]: Yup
19:56snowycoder[d]: karolherbst[d]: Maybe they changed the bit location? that 0xd is different from Kepler
19:56snowycoder[d]: (although this is unlikely, later generations seem to use that bit too)
19:56gfxstrand[d]: Let me plug in a Maxwell B and flip exceptions back on
19:56karolherbst[d]: ohh, you tested on maxwell2?
19:58gfxstrand[d]: I tested on GM107
19:58gfxstrand[d]: Now I'm testing on GM200
19:58karolherbst[d]: ahh
20:01gfxstrand[d]: Yeah, nothing on GM200 with the exception re-enabled. I think we have a Kepler bug
20:02snowycoder[d]: Why did we disable the exception on MaxwellB+ in the first place then
20:02gfxstrand[d]: It was blowing up some GL CTS tests (and maybe some Vulkan ones as well but I don't have those documented)
20:02gfxstrand[d]: But not that test
20:57gfxstrand[d]: These tessellationshaders are annoyingly simple
21:00snowycoder[d]: That's the annoying part, I think the bug is the same as the old one
21:03gfxstrand[d]: I wonder if this is related to the CTX timeouts thing I found with non-const offsets
21:19snowycoder[d]: What timeouts?
21:19gfxstrand[d]: non-constant vld offsets with multiple components were timing out
21:19gfxstrand[d]: There's a comment about it in nir_lower_vtg_io
21:20gfxstrand[d]: Oh, I'm going to bet that test is failing simply because the SSBO is in system ram
21:21gfxstrand[d]: Or not...
21:36gfxstrand[d]: snowycoder[d]: If I hack NVK to remove the atomic, the test passes. It's just taking forever
21:36gfxstrand[d]: Probably because the atomic is going across PCI
21:38gfxstrand[d]: It's also possible I suppose that the atomic is busted somehow
21:38snowycoder[d]: But doing the same thing with only a vertex shader (no tessel shaders) is way faster.
21:38snowycoder[d]: Maybe because of `RED` instruction benefitting from parallelism?
21:39snowycoder[d]: (also, why is SSBO going through PCI?)
21:40gfxstrand[d]: No VRAM maps on Kepler
21:40gfxstrand[d]: snowycoder[d]: The test I'm looking at does the atomic in the VS
21:41snowycoder[d]: You're checking `dEQP-VK.tessellation.misc_draw.tess_factor_barrier_bug` right?
21:43snowycoder[d]: If you copy that and remove the tess shaders (leaving only vertex -> fragment), updading the draw call accordingly (indexed rect draw) it passes.
21:43snowycoder[d]: And that is weird since the vertex shader is the one slowing everything down, but the number of vertices t processes doesn't change
21:43snowycoder[d]: gfxstrand[d]: So that is a hardware limitation? :/
21:44gfxstrand[d]: snowycoder[d]: yes
21:44gfxstrand[d]: snowycoder[d]: Yeah, that's super weird
21:46snowycoder[d]: If the tessellation shader being enabled serialized everything, maybe the memory controller cannot reduce the atomic calls.
21:46snowycoder[d]: That would be really slow if you add the overhead of PCI
21:47gfxstrand[d]: I could believe that the hardware massively reduces the parallelism of the VS when tess is in play, assuming that the VS is cheap compared to the rest of the geometry (tess exists to expand geometry after all0.
21:48Lyude: snowycoder[d]: btw- do you still have any questions? I noticed you mentioned me and dakr regarding some kernel patches and kepler
21:52snowycoder[d]: Lyude: There's still nothing 100% confirmed, but I might submit some patches for Kepler kernel module:
21:52snowycoder[d]: One to disable OutOfRange addr warnings.
21:52snowycoder[d]: The other adds a module parameter to raise the job timeout (this would be helpful for CTS tests with older cards).
21:54snowycoder[d]: gfxstrand[d]: is the tessellation OOR warning bug confirmed?
21:54snowycoder[d]: I think it the only solution is to disable the warning but if you find other methods it would be awesome.
22:04gfxstrand[d]: snowycoder[d]: What specifically were you looking for me to confirm?
22:06snowycoder[d]: gfxstrand[d]: This ^, we confirmed the Kepler bug is different from Maxwell, do you think that ignoring OOR warnings is the right solution here aswell?
22:08gfxstrand[d]: Not sure until we have a root cause