00:00 redsheep[d]: Hmm. I have never tried to capture on nouveau before, it seems it might be a bit challenging. The performance hit on the nvc0 x11 session that I have usually stuck to for being the most stable is pretty rough, like 10x or more performance loss. Might have to get my zink session working again before I can capture
00:26 gfxstrand[d]: Oof
00:31 gfxstrand[d]: I wonder if it would work better to run your desktop on an integrated card for the capture.
00:31 gfxstrand[d]: But IDK if you even have an integrated card
00:32 redsheep[d]: I have one, but I have been keeping it shut off in bios to avoid any funny business with it getting used when I don't want it. That might work. Right now I am trying to sort out an issue building lib32 mesa though
00:59 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290840621124161567/buildError.txt?ex=66fdec82&is=66fc9b02&hm=3152f2d3cd36f4a3dfec5cf2dd11864aedbf76ee4b269ccd019fe843a71e8cff&
00:59 redsheep[d]: Anybody know what might cause this, particularly with a lib32 build? My 64 bit version built fine with a very similar script
01:33 gfxstrand[d]: Ugh... More bindgen bugs, probably. I can try to look at it tomorrow. I need to do some 32-bit builds myself this week for something else.
01:47 redsheep[d]: I'm not quite sure what is going on, deleting nvk from my build makes it run fine, but I can still build dodo's aur just fine... It's probably something messed up with my scripts, but it's nearly impossible to tell exactly where and I probably just need to start over on my custom build scripts
01:48 redsheep[d]: Doesn't matter though, for now dodo's thing is fine since lib32 isn't likely something I will use
01:54 redsheep[d]: Ok yeah zink+nvk doesn't render sddm right now so session testing is mostly a nonstarter
02:02 redsheep[d]: redsheep[d]: Ok while a zink session does not work, an nvc0 wayland session does afford enough headroom to record the screen, aside from randomly hitching everything for 20 seconds here and there
02:05 redsheep[d]: I can also replicate zink not loading discord, but games seem fine so I should be able to get some footage
03:31 redsheep[d]: Oh good news! Talos principle got very notably faster, probably about 50%
03:35 Pheoxy[AWSTUTC8][m]: <gfxstrand[d]> "I wonder if it would work better..." <- https://flathub.org/apps/com.dec05eba.gpu_screen_recorder... (full message at <https://matrix.org/oftc/media/v1/media/download/AavmlIu-eT4Y3lH4i0obOETYhzorX5ByscKdCFStZCJgoS5h15D_WpwvFotkTSXXRCYgXbfuyJBsg1JfDlLcCCZCeSTQMr3gAG1hdHJpeC5vcmcvRGxyTUlEQU9YUkJSaHN5RlNydVF5elpW>)
03:37 Pheoxy[AWSTUTC8][m]: Also, what video and audio codec are preferred by your editor
03:38 gfxstrand[d]: H264 and AV1 should both be fine, I think. I'm not sure about audio but I also doubt we'll use the audio so just do whatever makes sense.
03:44 gfxstrand[d]: Pheoxy[AWSTUTC8][m]: BTW, the Matrix -> IRC -> Discord translation on that really didn't work out well. It looks like the Matrix bridge gives up on long messages and drops a long link instead (which was almost as long as the message. 😂)
03:44 gfxstrand[d]: I don't know if there's a good solution. Just pointing it out in case you weren't aware.
03:45 gfxstrand[d]: I could read it, I just had to follow a link.
03:45 Pheoxy[AWSTUTC8][m]: Ah, I just use element and don't really use bridges at the moment
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883142143709204/2024-10-01_21-43-44.mp4?ex=66fe141c&is=66fcc29c&hm=eaff534d424eef46f01a46bb20d442fab0849df5aef24668d40e676d4b5aa963&
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883144823865364/2024-10-01_21-40-12.mp4?ex=66fe141d&is=66fcc29d&hm=d164a428eb701000900ce2022b71616e9ddea774d338f97e6936d437fb155b5c&
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883150716600321/2024-10-01_21-35-57.mp4?ex=66fe141e&is=66fcc29e&hm=d917665cbe4f5f46ba4bfe1ea19a791e97034805db726f166718be9635e4769d&
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883160438997092/2024-10-01_21-27-30.mp4?ex=66fe1420&is=66fcc2a0&hm=948f53e0f52d942d8664c26b317b2d166a1fccb8664298cb65d6729b2b6f0df9&
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883161517068319/2024-10-01_21-17-38.mp4?ex=66fe1421&is=66fcc2a1&hm=d1244b2b557fcfdc2f40100738d34efb0939bf57202d041a45bb98c99ed61f99&
03:48 redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1290883162712313856/2024-10-01_20-56-47.mp4?ex=66fe1421&is=66fcc2a1&hm=2a90778be9568fa9f568ca365e564f6a4c5fb9cf698390043aca7759280ef38c&
03:48 redsheep[d]: Hmm doom eternal got faster but it's probably still not really a showcase. Got a decent video of the witness, talos principle, deep rock galactic, and supertuxkart, which covers DXVK, VKD3D-Proton, Zink, and native Vulkan. I recorded cyberpunk 2077 and horizon zero down, but both show artifacting and are pretty choppy.
03:51 redsheep[d]: gfxstrand[d]: Enjoy
04:10 tiredchiku[d]: redsheep[d]: I know what this is
04:10 tiredchiku[d]: gimme 5 mins
04:13 tiredchiku[d]: add `export TARGET=i686-unknown-linux-gnu` to your 32 bit build
04:14 tiredchiku[d]: meson just fails to tell the generator for which target to generate so it assumes the host architecture.
04:37 redsheep[d]: Oh that's pretty cringe
04:40 tiredchiku[d]: it is
04:54 HdkR: https://github.com/mesonbuild/meson/issues/13591 That's really good to know. I disabled asahi and nouveau for 32-bit locally because of that
04:55 HdkR: and rusticl
04:55 tiredchiku[d]: yup
05:30 rinlovesyou[d]: redsheep[d]: Yeah i did notice that recently
05:54 tiredchiku[d]: :Thonk:
05:54 tiredchiku[d]: :worksonmymachine:
05:55 tiredchiku[d]: oh
05:55 tiredchiku[d]: oh wait I have the env var in .bash_profile, not /etc/environment
09:03 notthatclippy[d]: Is there any usermode client of nouveau, other than what is in mesa?
10:31 karolherbst[d]: there is the old xorg driver
10:31 karolherbst[d]: but that one uses libdrm
10:59 notthatclippy[d]: Right, I mean anything else making DRM_NOUVEAU ioctls and such?
11:00 notthatclippy[d]: I guess a second question, is there a planned consumer of Nova uAPIs other than NVK?
11:02 notthatclippy[d]: (I mean, obviously, anyone could be making direct ioctls from anywhere in userspace, even closed source, so they do need to be stable. But the actual APIs I guess are specifically designed for mesa alone)
11:28 karolherbst[d]: I think that's TBD. Nova could also have a compatible UAPI just to make life easier for us
11:29 karolherbst[d]: though I guess some parts especially context creation would probably be different while at it
12:06 notthatclippy[d]: This is a prime opportunity to fix up the uapi mistakes. Though looking at nouveau, the uapi is so tiny the mistakes are minimal
12:07 notthatclippy[d]: But especially for Nova there are/will be much better ways to do things
12:07 notthatclippy[d]: A lot of it depends on how averse nvk is to making syscalls
12:52 djdeath3483[d]: gfxstrand[d]: Kind of curious what building blocks you need to make 64atomics out of the 32 ones
13:21 gfxstrand[d]: djdeath3483[d]: Oh, we have 64-bit compare exchange, so all we need is a loop. It's not like Intel where you have to build an actual lock.
13:23 djdeath3483[d]: right
13:23 djdeath3483[d]: plus whatever inside the subgroup
13:27 gfxstrand[d]: Yeah, we'll probably do subgroup shenanigans so we can reduce the number of iterations.
13:36 karolherbst[d]: oh yeah.. Intel not having 64 bit shared CAS is jsut pure pain
13:50 djdeath3483[d]: I guess doable in NIR
13:50 djdeath3483[d]: just ultra slow
13:50 tiredchiku[d]: trying to pull from a bare clone of the mesa repo results in
13:50 tiredchiku[d]: fatal: bad object refs/remotes/origin/24.2
13:50 tiredchiku[d]: error: /home/sidpr/documents/pkgs/mesa-git/mesa did not send all necessary objects
13:51 tiredchiku[d]: is this a known thing?
14:02 tiredchiku[d]: tiredchiku[d]: use this instead `export BINDGEN_EXTRA_CLANG_ARGS="--target=i686-unknown-linux-gnu"`
14:41 tiredchiku[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31488
14:41 tiredchiku[d]: :D
16:04 tiredchiku[d]: was just going through the issue tracker and looking for some low hanging fruit
16:38 tiredchiku[d]: closure candidate: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10681
16:38 tiredchiku[d]: implemented by !31033
16:41 gfxstrand[d]: tiredchiku[d]: I'll do a full CTS run on it in an hour or two and merge once that's done.
16:42 tiredchiku[d]: okie
17:05 tiredchiku[d]: I dunno if I'm taking the right approach to this .-.
17:06 tiredchiku[d]: I look at how things are done in other drivers
17:06 tiredchiku[d]: and anything more complex than a "oh this just needs to be enabled" makes my brain stop working
17:06 tiredchiku[d]: struggling to figure out address_binding_report
18:35 phomes_[d]: tiredchiku[d]: Wait. Are there others from Denmark here or what is with the danish flag reaction?
18:36 tiredchiku[d]: average Dodo things
18:36 tiredchiku[d]: (random reacts)
18:41 gfxstrand[d]: tiredchiku[d]: Running CTS now
18:41 tiredchiku[d]: all the best :D
18:57 airlied[d]: notthatclippy[d]: Don't know of any plans, maybe rocm or lvl0 :-p
18:59 notthatclippy[d]: Heh, good point.
19:00 gfxstrand[d]: notthatclippy[d]: I think the new APIs we added for NVK are fine and there's no need to change them. As karolherbst[d] said, device enumeration and maybe memory allocation needs some work.
19:00 gfxstrand[d]: But the new bind/submit ioctls are fine
19:01 gfxstrand[d]: notthatclippy[d]: I'd love to see CUDA run on it. 😉
19:02 notthatclippy[d]: For context, I'm just fishing around to see what is (planned to be) exposed and whether that would/should impact the APIs that GSP itself exposes.
19:03 notthatclippy[d]: gfxstrand[d]: Now that would be a gnarly thing to untangle, for non-technical reasons.
19:03 redsheep[d]: Cuda just having official support for Nova would be a great solution IMO, if that would make it possible for mesa to just use cuda prop stuff to wire up dlss and Nvidia users can run otherwise fully open drivers without losing cuda support
19:04 redsheep[d]: And mesa can stick to just implementing more open apis
19:04 notthatclippy[d]: I'll let someone else explain why that won't work :P
19:05 gfxstrand[d]: Once we get to the point where NVK/Nova can do userspace submit and fencing, I think those conversations are a lot more possible. Until then, I doubt there's any good reason why NVIDIA would want to touch anything but their KMD.
19:05 karolherbst[d]: prolly need a solution for usermode command submission first
19:05 tiredchiku[d]: karolherbst[d]: how would this come into the picture /genq
19:06 notthatclippy[d]: karolherbst[d]: "Why don't you just..." ...do what we do?
19:06 notthatclippy[d]: I doubt this sort of thing is something you can reasonably abstract to other mesa drivers
19:07 karolherbst[d]: because it touches driver agnostic UAPI I think
19:07 karolherbst[d]: and that means reaching consensus with everybody else
19:07 karolherbst[d]: there are some threads on that topic on dri-devel already
19:08 notthatclippy[d]: Yeah, I've keenly been following the ARM RFC
19:08 karolherbst[d]: the submission part itself isn't the problem, it's the fencing/syncobj one which is more of a problem
19:09 notthatclippy[d]: I think that might depend a lot on the memory model
19:09 gfxstrand[d]: __sima__: and I have been talking about it. There's at least the shape of a potential plan.
19:09 notthatclippy[d]: But it's far from my area of expertise so I'll gladly concede it's complicated.
19:10 notthatclippy[d]: I can find someone on our end if you want to bounce ideas off for what the hardware will or won't fight you for.
19:10 tiredchiku[d]: gfxstrand[d]: the concepts of a plan? :D
19:10 gfxstrand[d]: But introducing userspace fences without breaking the entirety of DRM and any other dma-buf consumers is tricky indeed.
19:13 gfxstrand[d]: I really do hope to make progress on it in the next few years, though.
19:13 redsheep[d]: tiredchiku[d]: Afaict if all the technical and non-technical details could be worked out that extension could be implemented in mesa by calling over to cuda the same way the prop vulkan drivers do. I could be wrong about how this works though.
19:14 tiredchiku[d]: hmm
19:14 tiredchiku[d]: neat
19:14 tiredchiku[d]: big hurdles to cross before we get there
19:15 tiredchiku[d]: tiredchiku[d]: also unrelated, but might start taking a goblin approach, which is enable it first and then figure out what it wants from me :LUL:
19:15 redsheep[d]: I'm sure there's miles of legalese and approvals that need to be sorted through
19:16 gfxstrand[d]: tiredchiku[d]: That's what I usually do. 😂
19:16 tiredchiku[d]: amazing
19:16 tiredchiku[d]: good to know I have the right mindset and approach now :D
19:16 gfxstrand[d]: I mean, I also look at the extension spec. But you always start by turning it on.
19:17 tiredchiku[d]: yeah, I keep that page open too, but it doesn't usually explain much to me
19:17 tiredchiku[d]: hopefully that changes now
19:17 karolherbst[d]: CTS driven development is the true and only approach, don't @ me 🙃
19:17 tiredchiku[d]: also, spent this evening entirely on zink+nvk
19:17 tiredchiku[d]: kernel 6.12-rc1, and my MR's branch as mesa-git
19:18 tiredchiku[d]: tiredchiku[d]: outside of a few random firefox crashes, and this visual jitter that'd happen infrequently and randomly, it was okay
19:19 redsheep[d]: ... You managed to make a session work? That's exactly the setup I was on last night where it was a nonstarter
19:19 tiredchiku[d]: electron/chromium failed to launch tho
19:19 tiredchiku[d]: redsheep[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1290914994430414928
19:20 redsheep[d]: tiredchiku[d]: Huh? Are you saying that you don't get a working session if you remove it from there and put it in /etc/environment?
19:20 tiredchiku[d]: dunno, haven't tried
19:21 redsheep[d]: I've always put it in environment and when it used to work that was fine for ages
19:21 redsheep[d]: I doubt that's my issue
19:21 tiredchiku[d]: I can try if you want me to
19:21 redsheep[d]: tiredchiku[d]: Would be good, just to isolate
19:21 tiredchiku[d]: but yeah, zink+nvk never really broke for me apart from that time I had issues with device select
19:21 tiredchiku[d]: that I still wanna debug
19:21 tiredchiku[d]: redsheep[d]: okie let me finish this video and I'll try it out
19:23 redsheep[d]: I am still not really sure that device select isn't the issue
19:24 notthatclippy[d]: gfxstrand[d]: By the way - and I just want to plant a seed for you to think about - even outside of userspace submission, a lot of our HW/FW was designed with the idea that you can bypass the kernel and do things from userspace directly. Now, the reasons we had for that don't fully apply to Nova/NVK stack, but it does still offer some benefits.
19:24 notthatclippy[d]: If this is a thing of interest here, then we can take your needs into account when developing the interface and features.
19:25 notthatclippy[d]: Totally random example, but on Volta+ there's a PTIMER alias meant to be mapped into userspace so you don't have to make syscalls to get the value.
19:26 tiredchiku[d]: redsheep[d]: :worksonmymachine:
19:26 karolherbst[d]: I'm surprised it mattered enough that nvidia cares
19:27 tiredchiku[d]: ...I wanna try a thing, hang on
19:28 tiredchiku[d]: if electron/chromium is borked rn
19:28 tiredchiku[d]: is CEF also borked
19:28 tiredchiku[d]: IT IS!
19:28 notthatclippy[d]: karolherbst[d]: Some of our UMDs use it a lot internally. But also vscode and other electron apps seem to spam the GL_TIMESTAMP query way too often.
19:28 karolherbst[d]: I see...
19:30 gfxstrand[d]: notthatclippy[d]: Yeah, to me the ideal kernel API is one that handles device enumeration, memory allocation, page tables (they have physical addresses so the kernel kind-of has to be involved), and some synchronization stuff, and that's about it.
19:30 tiredchiku[d]: works on current stable release
19:30 tiredchiku[d]: time for a bisect
19:31 tiredchiku[d]: ..how do I always manage to nerdsnipe myself into bisecting things
19:31 gfxstrand[d]: notthatclippy[d]: That's annoying but it makes sense
19:32 asdqueerfromeu[d]: gfxstrand[d]: How about userptrs?
19:32 karolherbst[d]: timestamps might be a good start to try out this idea of mapping things into userspace, because it's way less critical than actual submission
19:32 karolherbst[d]: asdqueerfromeu[d]: that's page tables
19:32 gfxstrand[d]: asdqueerfromeu[d]: That falls under the category of memory management and/or page tables.
19:32 gfxstrand[d]: karolherbst[d]: Oh, I'd happily take a timestamp page.
19:33 karolherbst[d]: yeah, it's a great thing to play around with
19:33 gfxstrand[d]: Just add a query to give me a map offset and treat it like any other BO map.
19:33 notthatclippy[d]: <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/ed4be649623435ebb04f5e93f859bf46d977daa4/src/common/sdk/nvidia/inc/class/clc361.h#L27>
19:33 notthatclippy[d]: You want a uapi to map this range.
19:34 notthatclippy[d]: Although the range and the layout may change from chip to chip, so the userspace should pull in the class header
19:34 gfxstrand[d]: Yeah, that's fine.
19:40 tiredchiku[d]: tiredchiku[d]: 11 steps left..
19:41 tiredchiku[d]: let's disable the 32 bit build to speed this up significantly
19:41 notthatclippy[d]: notthatclippy[d]: This is roughly also how all the GPU telemetry gets streamed out now. For Nova, the likely answer here is that the kernel will map it and then expose via hwmon and you just accept the syscall overhead since usually whatever is reading this can afford to be slow. But if, for example, NVK wanted to be aware of when it hits thermal throttling, this would be a way to do it overhead-
19:41 notthatclippy[d]: free
19:42 redsheep[d]: tiredchiku[d]: That's what I expected. I'll have to debug more later, it's probably related to discord and chromium not launching.
19:42 tiredchiku[d]: no, those don't launch for me too
19:42 notthatclippy[d]: (the caveat there is that stabilizing this SW-defined page is a _massive_ PITA that I still don't have a good answer for. But I expect I will before Nova gets to a point where this matters)
19:42 redsheep[d]: tiredchiku[d]: Wait what exactly are you bisecting? I thought it always worked
19:43 tiredchiku[d]: I'm currently bisecting for that
19:43 tiredchiku[d]: discord, chromium, spotify, steam
19:43 tiredchiku[d]: all borked
19:43 tiredchiku[d]: on mesa-git
19:43 redsheep[d]: Ok, sounds like a plan
19:44 tiredchiku[d]: but works on stable 24.2.3
19:44 tiredchiku[d]: so, bisecting
19:44 redsheep[d]: And I'm glad to hear there's a path for the telemetry data, it's painful not having that sometimes
19:44 gfxstrand[d]: notthatclippy[d]: Ooh! That sounds useful... I'd be happy to pull counters out of a mapped page and expose them via KHR_performance_query
19:46 redsheep[d]: Does mangohud have a backend to use that ext? If so that would be incredible
19:46 tiredchiku[d]: not hard to hook it up if it doesn't
19:46 notthatclippy[d]: redsheep[d]: On the proprietary drivers it uses that as of... r555 maybe?
19:47 redsheep[d]: That's amazing
19:47 tiredchiku[d]: notthatclippy[d]: it doesn't
19:47 tiredchiku[d]: still uses nvml/nvctrl
19:48 airlied[d]: Can there be multiple copies of the primer user?
19:48 notthatclippy[d]: nvml uses RUSD (RM/User Shared Data) under the hood now.
19:49 notthatclippy[d]: airlied[d]: AFAIK yes. I can test real quick.
19:49 tiredchiku[d]: notthatclippy[d]: ah, under the hood, maybe
19:49 tiredchiku[d]: I do know mangohud itself doesn't use the extension
19:50 notthatclippy[d]: nvml (and nvapi) is the way to expose that in an ABI-stable way for now while we're still soul-searching on it.
19:50 tiredchiku[d]: makes sense
19:50 tiredchiku[d]: honestly waiting for the next gen gpu release
19:50 tiredchiku[d]: so that we get a GSP version bump in linux-firmware 😅
19:51 asdqueerfromeu[d]: gfxstrand[d]: Don't forget fdinfo though ðŸļ
19:55 notthatclippy[d]: (yes, I can mmap and read ptimer in parallel from many processes)
19:56 tiredchiku[d]: 7 steps left..
20:02 gfxstrand[d]: notthatclippy[d]: Probably want to special case it in the kernel and only allow read-only maps but, yeah, I don't see why there would be a problem.
20:03 notthatclippy[d]: I expect you'd get a SIGBUS or something if you attempt to write to it, so you could also just hand out rope and noose schematics 🙉
20:04 gfxstrand[d]: Yeah, I think RO maps sigbus on write.
20:05 notthatclippy[d]: SIGSEGV IIRC. But on second thought, if you mapped it RW, or if kernel wrote to the VOLTA_USERMODE_A, maybe it would just be ignored...
20:06 karolherbst[d]: depends on what's mapped under the hood
20:06 karolherbst[d]: and how "setting the GPU clock" even works
20:06 karolherbst[d]: not sure we want some UAPI which guarantees that writes do nothing
20:07 karolherbst[d]: because if there is hardware where it does something we are kinda screwed
20:08 tiredchiku[d]: 3 steps, getting close
20:08 karolherbst[d]: I know that const pointers are a little bit weird in C, but I hope they aren't _that_ broken that compilers would randomly write to it
20:08 karolherbst[d]: worst case we file some gcc/clang bugs
20:09 gfxstrand[d]: Yeah, I think getting a RO map is fine. If you put something in the .data section of your .so, it ends up in a constant page that might be de-duplicated between processes.
20:09 gfxstrand[d]: CPU page tables usually have reliable write protection bits.
20:11 gfxstrand[d]: tiredchiku[d]: Looks like it's more than zero work. There were two test failures. I left them in a comment on the MR.
20:11 tiredchiku[d]: did just see that, I'll take a look after I finish this bisect
20:16 tiredchiku[d]: currently watching a stream on twitch while bisecting on plasma wayland ziNVK
20:16 tiredchiku[d]: going good
20:16 tiredchiku[d]: no weirdness in the past hour
20:22 tiredchiku[d]: 212d57f7e6701a4f307c2c049a0e3eccfce58965 is the first bad commit
20:23 tiredchiku[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31079/diffs?commit_id=212d57f7e6701a4f307c2c049a0e3eccfce58965
20:24 tiredchiku[d]: will try to fix that tomorrow
20:38 tiredchiku[d]: I'll look at both things tomorrow, it's 0208 here
20:45 redsheep[d]: tiredchiku[d]: Weren't you saying it still fails with hardware acceleration off? That's pretty weird
20:45 tiredchiku[d]: yeah
20:45 tiredchiku[d]: ¯\_(ツ)_/¯
20:45 tiredchiku[d]: electron works in mysterious ways
20:46 tiredchiku[d]: gfxstrand[d]: if I understand this right I "just" need to enforce clamping if the DepthReplacing execution mode is set in the pipelin
20:49 tiredchiku[d]: ..now to find where the pipeline code is e-e
20:50 tiredchiku[d]: but anyway, future me headache
21:09 skeggsb9778[d]: notthatclippy[d]: VOLTA_USERMODE_A is meant to be mapped RW to userspace anyway, it's where the doorbell lives too
21:14 sima: gfxstrand[d], I think our plan is still really solid, at least I haven't come up with a gap from the kernel side
21:14 sima: and from my very rough vk standard understanding that side should be doable too
21:39 __sima__[d]: gfxstrand[d]: oh now I get where that irc ping is from ðŸĪŠ
22:22 gfxstrand[d]: skeggsb9778[d]: Would there be any problems created if userspace is able to bang the doorbell?
22:26 skeggsb9778[d]: Userspace is *meant* to be ringing the doorbell
22:27 skeggsb9778[d]: You could ring another runlist/channel's doorbell, but the worst that happens is the runlist wakes up and re-checks for channels with pending work
22:28 gfxstrand[d]: Okay, as long as there's no harm in extra doorbell rings, then I think we're fine.
22:28 gfxstrand[d]: So is there one doorbell for everything?
22:28 skeggsb9778[d]: As long as you don't write GPPUT in USERD before you've filled the data in the GPFIFO/push buffers, nothing terrible will happen
22:29 skeggsb9778[d]: Yes, VOLTA_USERMODE_A is a shared address range. The doorbell reg takes runlist+chid
22:29 skeggsb9778[d]: (it's technically an opaque handle from the RM, but in current HW that maps to runlist+chid in practice)
22:30 skeggsb9778[d]: on Volta, it's just chid
23:07 gfxstrand[d]: tiredchiku[d]: The annoying bit is that `depthClampZeroOne` and `VK_EXT_depth_range_unrestricted` kind-of conflict. Section 29.10.1 of the Vulkan spec has the rules for sorting it out:
23:07 gfxstrand[d]: https://registry.khronos.org/vulkan/specs/1.3-extensions/html/chap29.html#_depth_clamping_and_range_adjustment
23:08 gfxstrand[d]: If we have to do the clamp in the shader (I suspect we do), we'll have to push a uniform based on the setting of those two bits and only clamp in the shader if the uniform is set.
23:08 gfxstrand[d]: It'll only add like 2 instructions so I'm not too worried about the perf implications.