00:00karolherbst[d]: mhhh I wonder...
00:00karolherbst[d]: why is fma special :ferrisUpsideDown:
00:02karolherbst[d]: anyway, a lot of cursed stuff is going on there
00:55gfxstrand[d]: Why is fadd special?!? That's the one that really bothers me. I'm sure it's probably `fma(x, 1, y)` internally or something like that. 🙄
00:56HdkR: Oh no, the secret
00:57HdkR: Now you know it's fma with hardcoded 1 as the multiplier
00:57HdkR: "Anyway"
01:07gfxstrand[d]: I'm sure someone has a patent for it
08:01karolherbst[d]: gfxstrand[d]: I said fma
08:03dadschoorse[d]: HdkR: as far as I can tell, that's not the case for amd. at least VOPD fma with constant multiplier has more restrictions on the third source than add has on either sources
08:04karolherbst[d]: yeah, that was also the case for nvidia, but no need for pointless restriction if your ISA is 128 bit based
08:07HdkR: Wouldn't say it's pointless if it lets you reuse the FMA pipeline without burning new die space :D
08:08karolherbst[d]: yeah, it makes total sense for FMA to have its own thing 😄
08:08karolherbst[d]: though, FMUL and FADD are still FMA internally
08:09karolherbst[d]: FSWZADD as well btw :ferrisUpsideDown:
08:09karolherbst[d]: what I'm just confused about is why FMA <-> alu has a different latency req than fma <-> fma and alu <-> alu
08:09karolherbst[d]: but maybe that's just an ampere thing
08:20dadschoorse[d]: dadschoorse[d]: maybe it's still the same ALU in the end, but the read ports are different
09:01tiredchiku[d]: FDO gitlab is back up
09:05karolherbst: well....
09:29tiredchiku[d]: I finally have a game that matches its performance to what I had on the proprietary driver: Disco Elysium
10:21HdkR: nice nice
10:39tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1247499915471487028/image.png?ex=66604055&is=665eeed5&hm=585332aed0367efdcb64d9a2af17b17aea0737daf14decc7016535082eb2bf4f&
10:39tiredchiku[d]: the bane of hopping between drivers:
10:39tiredchiku[d]: I play SF6 on nvprop because NVK perf isn't there yet, and well
10:39tiredchiku[d]: every time I want to test it on NVK I have to spend 20 mins waiting for pre-warming, then 20 more when I go back to nvprop
10:49ahuillet[d]: is there no way to disable the pre-warming? I remember on DXMD seeing 20+ minutes of shader compilation times before even making it to the main menu
10:50ahuillet[d]: sure you have no stutter but your gaming session is over before it could start...
10:50tiredchiku[d]: there is a way to disable it, yes
10:51tiredchiku[d]: but that makes nvk perf worse
10:51tiredchiku[d]: + I don't really plan when I'm gonna boot up a game on NVK e-e
11:31notthatclippy[d]: Does NVK hit GSP when compiling shaders?
11:35tiredchiku[d]: I'm not sure
11:36tiredchiku[d]: tbh I'm not even sure what you mean by that, so 😅
11:44ahuillet[d]: hopefully not but how would she confirm?
11:44notthatclippy[d]: The obvious way to parallelize shader compilation is to spawn $nproc threads that each compile one shader. Basically the `make` paradigm. But, if each process needs to communicate with GSP for whatever reason, you will end up serializing on the message queue and get worse perf.
11:45ahuillet[d]: (oh, NVK not blob, sorry)
11:45notthatclippy[d]: What can happen is that a beefier (core-ier) CPU compiles shaders slower because it's more threads contending on the single GSP core
11:45notthatclippy[d]: I know the answer for the blob and you're not going to like it, but it's offtopic here
11:45HdkR: The compiler definitely doesn't need to communicate with GSP
11:45karolherbst[d]: also
11:46karolherbst[d]: mesa spawns such threads ahead of time generally
11:47notthatclippy[d]: karolherbst[d]: As in separate process that you IPC the work to?
11:48karolherbst[d]: no
11:49karolherbst[d]: just a thread within the application
11:49karolherbst[d]: well
11:49karolherbst[d]: multiple
11:49notthatclippy[d]: Well, you're not getting much out of that if the app compiles all shaders on load..
11:49karolherbst[d]: I think people were thinking of having an mesa IPC thing, but it's really hard to get right
11:49karolherbst[d]: notthatclippy[d]: mhh?
11:49notthatclippy[d]: karolherbst[d]: We also decoded against it for the blob
11:49karolherbst[d]: you still compile in threads
11:50karolherbst[d]: sometimes
11:50karolherbst[d]: we do have worker threads to handle all of the shader caching though
11:50notthatclippy[d]: karolherbst[d]: I may be projecting our own problems to your stack here, but any serialization among those threads will hurt you. The blob has some on init.
11:51karolherbst[d]: ohh fair
11:51karolherbst[d]: that part is kinda up to drivers on how to do anyway
11:52notthatclippy[d]: My advice is to cache in kernelmode absolutely everything that is feasibility cacheable
11:52karolherbst[d]: and it has diminishing returns in the first place, so having more than let's say 4 or 6 threads is already kinda... meh
11:52notthatclippy[d]: notthatclippy[d]: And obviously serve it without locks.
11:52karolherbst[d]: sure
11:52karolherbst[d]: but we don't spawn threads on the fly most of the time
11:52karolherbst[d]: so not sure how it's really relevant
11:52karolherbst[d]: I don't even know if anybody actually spawns threads besides when creating an API context object
11:53notthatclippy[d]: We had issues with dxvk shader compilation in steam when you run the game.
11:53notthatclippy[d]: Steam ends up spawning a bunch of fossilize processes and/or threads, and each initializes a basic vk context
11:53karolherbst[d]: right
11:53notthatclippy[d]: And that has serializing moments
11:54karolherbst[d]: yeah, fair
11:54karolherbst[d]: so it's driver internal serialization between vk contexts?
11:55karolherbst[d]: I'm just trying to figure out if the problem is context switching between those threads, or simply creating those
11:55notthatclippy[d]: karolherbst[d]: The biggest pain point was when vk context creation has to talk to GSP, because it's single core
11:55karolherbst[d]: okay
11:55karolherbst[d]: well.. nvk doesn't do that afaik
11:55karolherbst[d]: I think creating a VkDevice is supposed to be cheap
11:56karolherbst[d]: and the GPU channel is tied to the physical device
11:56karolherbst[d]: but I might be wrong
11:56notthatclippy[d]: That's awesome then. I'm not sure exactly where it was in our UMD, I only looked at it from the kernel side
11:56karolherbst[d]: or maybe it's tied to the vkDevice?
11:56karolherbst[d]: uhhh
11:56karolherbst[d]: dunno 🙂
11:56notthatclippy[d]: Should be easy to check though: run $nproc fossilize apps and see if they scale properly
11:58karolherbst[d]: mhhhh
11:59notthatclippy[d]: Or just put a kprobe on GSP communication and see if it's hit
12:26magic_rb[d]: Ive been noticing much worse battery life on nvk+gsp than on fully proprietary nvidia, is there something i play with to make it better or is it a known problem rn
12:29karolherbst[d]: probably needs more optimization and all of that
12:31notthatclippy[d]: For one, r535 gsp.bin was still targeting datacenter / always-on GPUs primarily
12:39tiredchiku[d]: I have a question
12:40tiredchiku[d]: [sidpr@constructor ~]$ ls /lib/firmware/nvidia/tu116/gsp/
12:40tiredchiku[d]: booter_load-535.113.01.bin.zst booter_unload-535.113.01.bin.zst bootloader-535.113.01.bin.zst gsp-535.113.01.bin.zst
12:40tiredchiku[d]: afaik nv only provides the GSP image... where do the other 3 images come from?
12:43notthatclippy[d]: tiredchiku[d]: They're embedded as byte arrays in a .c file in openrm
12:43notthatclippy[d]: There's a script to extract them
12:43tiredchiku[d]: I see
12:43tiredchiku[d]: where can I find this script? for science reasons c:
12:45tiredchiku[d]: (I wanna try replacing those with the firmware in 555
12:45tiredchiku[d]: imagine I closed the bracket in the previous message)
12:47asdqueerfromeu[d]: tiredchiku[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/nouveau/extract-firmware-nouveau.py
12:47tiredchiku[d]: thanks :saigeheart:
13:05tiredchiku[d]: ah, GSP init fails
13:40dwfreed: karolherbst: they do have stable v6 addresses, yeah
13:40karolherbst: dwfreed: and they are all unique?
13:40dwfreed: yes
13:40karolherbst: okay
13:44karolherbst: thanks for confirming :)
13:44karolherbst: not 100% it's a bot, but like.... *sigh*
13:44dwfreed: their github is wild too
13:44karolherbst: yeah
13:44dwfreed: gonna promote that to a network-level
13:44karolherbst: I've reported them on github as well
13:45karolherbst: they left a comment here: https://gist.github.com/karolherbst/2b07364246ab7e101e7eebca3086921a
13:45karolherbst: that's how I got suspicous :D
13:45dwfreed: yeah, saw that
13:47karolherbst: spam is really getting worse. Even on gitlab I think I saw some bots writing mostly coherent things, but then random words link to spam/companies/products/whatever
13:59gfxstrand[d]: karolherbst[d]: Nope. We have a channel per queue and a DMA channel per device. If you don't create any queues, you won't get any 3D or compute channels, though.
14:01gfxstrand[d]: And we can probably avoid that one for the the zero queue case
14:04karolherbst[d]: ahh
14:05gfxstrand[d]: Without that, the queues wouldn't actually run independently.
14:05karolherbst[d]: yeah, makes sense
14:10gfxstrand[d]: It might make sense to add a `VK_DEVICE_CREATE_COMPILER_ONLY` flag where you can only create descriptor set layouts, samplers, pipelines, and shaders. No queues, no device memory.
14:10karolherbst[d]: yeah...
14:10karolherbst[d]: I think offloading compilation is something applications like to do
14:10gfxstrand[d]: We wouldn't even have to open the device FD
14:11gfxstrand[d]: Yeah, especially with ESO
14:11gfxstrand[d]: zmike: ^^
14:11karolherbst[d]: could also lazy initialize stuff until the first queue was created
14:12themaister[d]: gfxstrand[d]: would be nice for Fossilize
14:12karolherbst[d]: if feasible
14:12gfxstrand[d]: karolherbst[d]: Queues are created with the device
14:12karolherbst[d]: ahh...
14:12karolherbst[d]: nvm then
14:20zmike[d]: what
14:20zmike[d]: gfxstrand[d]: y u ping
14:21karolherbst[d]: zmike[d]: `VK_DEVICE_CREATE_COMPILER_ONLY`
14:21zmike[d]: and?
14:21zmike[d]: or are you saying I should do this
14:23zmike[d]: what's the purpose of this aside from foz
14:23zmike[d]: objects created on one device can't be shared with another device
14:25karolherbst[d]: couldn't zink also use this for compiling in parallel?
14:25zmike[d]: no
14:25zmike[d]: see above
14:25karolherbst[d]: mhh
14:25zmike[d]: this could only be used for fully offline compilation
14:26zmike[d]: or I guess engines could have a device that they use to precompile and warm the cache for the main part of the engine that re-creates pipelines and associated objects with cache hits?
14:26zmike[d]: idk how that's different or more useful than just creating the same pipelines/objects in a thread off the main device though
14:38gfxstrand[d]: zmike[d]: Nope, just making you aware since it'll probably be maintenance
14:39zmike[d]: :stressheadache:
15:30notthatclippy[d]: Oh, I think it is coming back to me now (sorry, was on an 8 month vacation since then). IIRC Valve captures API traces somehow (not sure if from end users or in their testing) and then coalesces these into fossilize files based on whether they generate different shader binaries. Then, when you start a dxvk game, steam uses fossilize-replay to replay the whole API up until the point where it spits
15:30notthatclippy[d]: out a shader. IIRC it spawns `$nproc/2` processes, with `--num-threads={1,2,4}` each (no clue how they divined that split), and each one of them is replaying the whole API.
15:32notthatclippy[d]: They are probably filtering out _some_ API calls, but I really doubt you'll be able to skip creating the physical device from just their API calls.
15:32notthatclippy[d]: But maybe they'd be open to change how the API is captured/replayed if it is causing issues for mesa ¯\_(ツ)_/¯
15:34notthatclippy[d]: The one thing I remember is that we had an issue with initial startup scaling where having more cores meant more apps starting and then contending on the GSP, causing a thundering herd kind of problem. I think they might have solved it by just adding a `sleep(50ms)` after each process spawn to let it do the early init in peace.
16:01ahuillet[d]: themaister[d]: you still need to instantiate enough from the kernel to know the GPU target though
16:03gfxstrand[d]: Yes
16:03gfxstrand[d]: So if they fire off lots of processes, there's not much we can do
16:03notthatclippy[d]: Well, if all the info is in the kernel, you can generally avoid most serialization.
16:03ahuillet[d]: but it doesn't have to be as heavy weight as what mtijanic was describing, in theory
16:04gfxstrand[d]: Yeah, right now we have to create actual channels to get that information from the kernel. 😩
16:04ahuillet[d]: the Mesa stack could just have one process read the register to know the target, feed that to NAK, and let NAK run without having created actual resources on the GPU
16:04gfxstrand[d]: I've had "please cache in the kernel and give me a bare query" on my nouveau.ko wishlist for a while.
16:04notthatclippy[d]: gfxstrand[d]: Yeah, that's a problem. No amount of smart locking models is gonna help there when you hit a single core microprocessor at the end of the tunel.
16:04karolherbst[d]: this query exists, no?
16:05notthatclippy[d]: Anything cacheable, such as GPU capabilities should just be cached in the kernel.
16:05gfxstrand[d]: Nope. In order to get class versions and stuff we have to create a context
16:05karolherbst[d]: oh wait.. you need a channel to query the classes
16:05karolherbst[d]: well.. don't need the classes to know what to compile against :ferrisUpsideDown:
16:05gfxstrand[d]: Yeah, we really want nouveau.ko to just create one of each channel on boot and cache the info
16:05ahuillet[d]: yes, but that can be cached in the kernel after the first channel ever was created on that GPU, right?
16:05gfxstrand[d]: It can, it just requires UAPI we don't have
16:05tiredchiku[d]: notthatclippy[d]: can't confirm $nproc/2
16:06tiredchiku[d]: whenever I enabled shader pre-caching it only spawned 1 fossilize thread
16:06ahuillet[d]: gfxstrand[d]: right, maybe that's what needs adding, although how useful is it if apps aren't using them
16:06gfxstrand[d]: All of this is 100% technically feasible, it just needs someone to spend some time hacking on nouveau.ko
16:07gfxstrand[d]: karolherbst[d]: No, but we need it to enumerate physical device properties
16:07karolherbst[d]: mhh..
16:07karolherbst[d]: right
16:08notthatclippy[d]: For the blob, we've actually spent several eng-months on the various caching and other lockless solutions, and we are still not free of UMD->GSP communication in this path.
16:08tiredchiku[d]: I had to set unShaderBackgroundProcessingThreads 8 in ~/.steam/steam/steam_dev.cfg to get it to use all 8 threads on my laptop
16:08karolherbst[d]: there is `nv_device_info` or what it was called
16:09karolherbst[d]: `nv_device_info_v0` but it doesn't contain much
16:10ahuillet[d]: at any rate, it doesn't seem like the most pressing problem ATM
16:10notthatclippy[d]: Yeah, I was just wondering if that's the reason why shader compiling was slow for Sid, since it's a non-obvious thing we've hit before.
16:11tiredchiku[d]: no I just have a poor CPU that thermal throttles
16:11tiredchiku[d]: i5-9300H
16:12notthatclippy[d]: If anyone is particularly bored, I can share a few greps to use on openrm to see which kinds of data the NV driver caches in kernel, which you can use as a baseline for a potential nouveau.ko improvement.
16:12tiredchiku[d]: turbo clock all core load is supposed to be 4 GHz, but it struggles to maintain 3.2
16:12tiredchiku[d]: maybe 3.6 on a good day
16:12ahuillet[d]: that sounds normal though?
16:12ahuillet[d]: abnormal thermal throttling would be 800MHz (yes, I've seen it...)
16:12tiredchiku[d]: it *used* to be able to sustain 4GHz .-.
16:13ahuillet[d]: oh :/
16:13tiredchiku[d]: ahuillet[d]: have seen that while messing with BIOS settings too
16:14tiredchiku[d]: I have that cursed bios that allows me to mess with more things than a laptop would allow too
16:16tiredchiku[d]: but yeah, I distinctively remember my CPU sustaining 4GHz, and now no matter what I do it won't do it again 😅
16:55ahuillet[d]: good thing you're getting a new machine then
16:56asdqueerfromeu[d]: Investigating the poor frametime issues without `NVK_DEBUG=push_sync` option enabled would be nice
17:10redsheep[d]: asdqueerfromeu[d]: That makes it go away? Does that come with a perf hit?
17:12redsheep[d]: I've had quite a bit of stutter as well but I just assumed it was a combination of rendering sometimes hitting something really slow, and my screen being stuck at 60hz making it feel worse
17:12tiredchiku[d]: asdqueerfromeu[d]: is that enabled by default?
17:17tiredchiku[d]: gfxstrand[d]: does your bindless ubo branch need any work to be able to interact with stuff on the cbuf branch
17:17tiredchiku[d]: or can I create patchsets out of the work done on both and apply those to current HEAD, for :wolfFIRE: purposes
17:38asdqueerfromeu[d]: tiredchiku[d]: No
17:41tiredchiku[d]: oh
17:42tiredchiku[d]: OH
17:42tiredchiku[d]: enabling that option makes it go away, interesting
17:43magic_rb[d]: redsheep[d]: runs good! plasma itself, is a bit, meh, well its nice, but going from xmonad to this is quite a jump lmao. I also tried the proprietary driver just for the hell of it, it booted, worked, opengl and vulkan, then started chugging due to explicit sync probably and then proceeded to segfault into a blinking capslock key :)
17:43magic_rb[d]: but yeah on nouveau it works quite well
17:44tiredchiku[d]: on a laptop... are you sure the session isn't being run by your iGPU?
17:44magic_rb[d]: Ofc
17:45magic_rb[d]: I do have hdmi connected to the dgpu tho
17:45magic_rb[d]: So i was mostly testing reverse prime
17:45tiredchiku[d]: ah, HDMI, then okay 😅
17:46magic_rb[d]: Hdmi on the dgpu has been an exprience, on nixos using it causes a segfault, with proprietary nvidia
17:46tiredchiku[d]: f
18:07gfxstrand[d]: tiredchiku[d]: You're welcome to try but it currently comes with no warranty. 😅
18:09tiredchiku[d]: that's fair, I just wanna see what happens :p
18:09tiredchiku[d]: curiosity driven testing
19:27tiredchiku[d]: gonna have to do some rebasing to make it apply, hmmm
19:41tiredchiku[d]: that was actually a very painless cherry-pick
19:42tiredchiku[d]: this is gonna be fun
19:42tiredchiku[d]: so many patches on top of HEAD
19:43tiredchiku[d]: !29154, !29405, cbuf0, and bindless-ubo
19:43tiredchiku[d]: :wolfFIRE:
19:47gfxstrand[d]: Does `uimad` just have a different latency for shits and giggles?!?
19:48HdkR: It makes both u and i mad
19:49gfxstrand[d]: That's not gonna be utter hell for WaW dependencies...
19:49gfxstrand[d]: 🤦🏻♀️
19:49gfxstrand[d]: Like, seriously, how can you do 32 IMADs in 6 cycles but it takes 8 to do one. 😩
19:52gfxstrand[d]: Yeah, I'm going to have to burn my scheduler to the ground.... again. 😭
20:00tiredchiku[d]: oh a compile error
20:00tiredchiku[d]: guess I won't be building on latest HEAD
20:00tiredchiku[d]: but anyway, that's for me to figure out tomorrow
20:04karolherbst[d]: gfxstrand[d]: imad.wide, yes
20:05karolherbst[d]: gfxstrand[d]: the complexity of reality is kinda 50x what you have atm in terms of how the latencies are expressed 😛
20:05karolherbst[d]: there are multiple tables
20:05karolherbst[d]: and the type of instructions matters
20:05karolherbst[d]: two ways
20:05karolherbst[d]: so e.g.
20:06karolherbst[d]: alu <-> alu has a 4 latency for raw, but fma <-> alu is 5
20:06karolherbst[d]: and a lot of details like that
20:07karolherbst[d]: buf fma also includes fadd,fmul and _some_ others
20:07karolherbst[d]: it's super complex
20:07karolherbst[d]: so if you want to know what's the latencies you have to look at the category of both sides, look up some weirdo table
20:08karolherbst[d]: and then it will tell you
20:08dadschoorse[d]: what about fma -> fma
20:08karolherbst[d]: also 4
20:08karolherbst[d]: and imad is extra secial
20:08karolherbst[d]: because there it also matters which source
20:09gfxstrand[d]: karolherbst[d]: Doesn't seem to need .wide
20:09karolherbst[d]: ehh wait, you also talk about uimad
20:10karolherbst[d]: but anyway
20:10karolherbst[d]: that should be 4 :ferrisUpsideDown:
20:10karolherbst[d]: mhh
20:10karolherbst[d]: what relationship do they have?
20:10karolherbst[d]: waw?
20:10gfxstrand[d]: standard raw
20:11karolherbst[d]: on ampere that should be 4
20:11gfxstrand[d]: 🤷🏻♀️
20:11karolherbst[d]: between two uimads that is
20:11gfxstrand[d]: Oh, it was uimad into uiadd3 or something like that
20:11karolherbst[d]: if you have uimad and something else, it might not be 4
20:12karolherbst[d]: should be the same
20:12karolherbst[d]: (an ampere that is)
20:12karolherbst[d]: *on
20:12gfxstrand[d]: 🤷🏻♀️
20:12karolherbst[d]: however, how do you get the initial value?
20:12gfxstrand[d]: I've already moved onto the next test
20:12gfxstrand[d]: We're going for "good enough" right now. I'll unhack it later
20:13karolherbst[d]: yeah, fair
20:24phomes_[d]: phomes_[d]: nvdump works for me now. The trick was to do a release build. With rust 1.78 spirv-reflect asserts in debug due to https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html#asserting-unsafe-preconditions
20:47gfxstrand[d]: Ugh... More RaW dependencies because of different instruction latencies. 😩
20:47gfxstrand[d]: phomes_[d]: Do you have a backtrace for that? It might be pretty easy to fix
20:50skeggsb9778[d]: gfxstrand[d]: That's not true any longer in the full version of the patch series I recently reposted a shorter version of, all the runlist/engine/class mappings are returned up front when allocating the device
20:50skeggsb9778[d]: Not exposed to userspace, as the series is just trying to clean up the kernel APIs, but it's a step towards it
20:52gfxstrand[d]: Cool.
20:53phomes_[d]: gfxstrand[d]: https://paste.centos.org/view/9490204c
20:54gfxstrand[d]: Ugh... it's deep inside spirv-reflect. 😭
20:54gfxstrand[d]: And that library isn't really maintained anymore
20:55redsheep[d]: Oh yeah, now that I know how to get better backtraces I can probably find what blows up doom eternal on debug builds
20:56gfxstrand[d]: I don't think Grahm cares. I could probably get him to toss me the reigns but IDK that I want them.
20:56gfxstrand[d]: (to spirv-reflect)
21:09redsheep[d]: Hmm coredumpctl doesn't show anything after the doom eternal crash, and it still doesn't give any dmesg output
21:17gfxstrand[d]: Bummer
21:21redsheep[d]: Is there some magic I can do to get the dump, something I need to enable? Last time we dug into the logs I'm pretty sure it had been segfaulting
21:22gfxstrand[d]: I don't know a lot of tricks for debugging wine stuff, unfortunately.
21:24HdkR: gdb attach before it explodes is nice
21:40redsheep[d]: Hmm. Seems debugging with wine isn't trivial. I managed to attach but it seems like either that prevented the crash progressing naturally, or it just hates attaching to wine. Found some reddit threads claiming it's possible though, maybe I can still make it work
21:42redsheep[d]: gfxstrand[d]: I could have sworn you managed to do this a few months ago when dealing with a big report for some wine game, I think zmike helped
21:44redsheep[d]: I recall there being a script involved... I'll find it.
21:53redsheep[d]: karolherbst[d]: Karol just found a script you had suggested when I was debugging this last, how do I actually use this? It looks like this might go in my home folder, and what, it just automagically makes gdb work better when attaching to wine?
21:53karolherbst[d]: redsheep[d]: I don't know anymore
21:56redsheep[d]: Alright, I'll just wing it
21:58redsheep[d]: Oh yeah, this was doing a stack overflow, not a segfault. I remember now why I stopped debugging this
22:00redsheep[d]: Yep that is a sorry excude for a backtrace. ```#0 0x00006ffffff59230 in RtlAllocateHeap () from target:/home/jared/.local/share/Steam/steamapps/common/Proton - Experimental/files/lib64/wine/x86_64-windows/ntdll.dll
22:00redsheep[d]: No symbol table info available.
22:00redsheep[d]: #1 0x0000000000000000 in ?? ()
22:00redsheep[d]: No symbol table info available.
22:01redsheep[d]: excuse*
22:44redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1247682361144119327/gdb.txt?ex=6660ea3f&is=665f98bf&hm=45fb66c9da3317b52b0c3d962afc52f0e45b1be4134334e42b510e5061283160&
22:44redsheep[d]: Okay I finally got it. That was about 400x more complicated to get than any other usage of gdb that I have had to do before, but I understand it much better now.
22:45redsheep[d]: gfxstrand[d]: Somewhere in there ^ should be what is crashing debug NAK on doom eternal
22:45HdkR: That's a spicy backtrace
22:46gfxstrand[d]: The good news is that it's a shader compile so if you can get a fossil out of it, I can repro locally
22:47alpha3427[d]: What do I need to be familiar with to implement gl 4.4 4.5 4.6 or nouveau?
23:01pavlo_kozlenko[d]: Who's in charge of opengl here?
23:01pavlo_kozlenko[d]: Ask Him
23:01pavlo_kozlenko[d]: 4.4 4.5 seems to be implemented, but some crash in tests
23:02pavlo_kozlenko[d]: No one told me the details, I'm sorry
23:03pavlo_kozlenko[d]: karolherbst[d]: Is it possible to build mesa with experimental support gl 4.4 and 4.5?
23:03pavlo_kozlenko[d]: Doesn't such a build flag exist?
23:14redsheep[d]: Honestly, given that zink on nvk is already gl 4.6 conformant with Turing and up I am not sure what the point would be. You likely only need the very latest gl for high performance applications, and the serious performance issues with Maxwell and Kepler aren't about to go away, and it appears the only other generation where 4.6 is likely possible is Fermi, which is 14 years old.
23:15redsheep[d]: If the nouveau gl maintainers haven't managed it in the last decade it's not going to be easy, so it is likely quite a lot of effort for not very many gpus it would run on.
23:17gfxstrand[d]: pavlo_kozlenko[d]: You don't need to `MESA_GL_VERSION_OVERRIDE=4.6` and `MESA_GLSL_VERSION_OVERRIDE=460` should do the trick.
23:19redsheep[d]: I suppose something with lower overhead than zink would be nice for current gpus, but that has been discussed and I think the conclusion has always been rewrite the driver top to bottom.
23:19gfxstrand[d]: But also, yeah, the delta between where nouveau is now and 4.6 conformance is a pretty long tail and probably involves reworking it to use NAK and finishing NAK off for Maxwell.
23:19gfxstrand[d]: At which point, you may as well burn it and start over
23:25redsheep[d]: gfxstrand[d]: Would it be fair to say it only makes sense for anyone to bother once you have NAK closer to prop driver performance so we can get a better idea of how much of a difference a native driver could make?
23:26gfxstrand[d]: Oh, NAK is already beating codegen across the board.
23:26redsheep[d]: I think we don't have a good way to know how good zink will be until then
23:26gfxstrand[d]: Like, I don't think the Zink issues are NAK.
23:26redsheep[d]: Oh well, yes of course. I mean compared to prop nvidia opengl
23:26gfxstrand[d]: Yeah, I don't think it makes sense to re-build GL on perf terms until NVK is closer to blob perf.
23:26redsheep[d]: In terms of performance, not defects, yeah.
23:27gfxstrand[d]: Like if nouveau GL is beating Zink+blob somewhere, then maybe you have an argument but I really doubt it is.
23:27gfxstrand[d]: Even with defects, there's a question of whether it's easier to fix, write and maintain new, or just fix Zink.
23:30redsheep[d]: Yeah working towards continued performance and nvk+zink fixes seems like a good way to go. Your performance work is already fixing games as a side effect, as some people have shown with the bindless ubo branch.
23:32alpha3427[d]: gfxstrand[d]: What plans do you have for Kepler?
23:32gfxstrand[d]: Hope someone comes along and writes patchds
23:33alpha3427[d]: Vulkan for kepler broken?
23:33gfxstrand[d]: A bunch of stuff works but it's a long ways from passing all the tests
23:33gfxstrand[d]: And the new compiler doesn't support Kepler at all yet.
23:35alpha3427[d]: I've heard from a lot of people that vulkan on Kepler is made through churches and it's some kind of broadcast. What do you think about it?
23:36gfxstrand[d]: I don't know what you mean by that.
23:40pavlo_kozlenko[d]: I've heard from my friends that vulkan on kepler is partially implemented in software, and other architectures like maxwell> already support vulakn fully, sorry for the misunderstanding
23:41gfxstrand[d]: No, Vulkan on Kepler is no different than any other hardware
23:41gfxstrand[d]: It just doesn't support everything Maxwell does
23:44airlied[d]: I think nouveau GL is already 4.5 conformant (maybe even 4.6) except for one or two corner case tests but also it has some general instability in multithreading that it made little sense to bother putting it through formal conformance
23:49redsheep[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1247698739720622111/fossilize.41c845bd66f73301.1.foz?ex=6660f980&is=665fa800&hm=00ebdc97f62dde09e9e3d7d0f7ae22277d8d33ef6f4bbe453c3fabe0a818f9dd&
23:49redsheep[d]: gfxstrand[d]: That was unreasonably hard to get it to output
23:49gfxstrand[d]: redsheep[d]: Can you attach that to a bug?
23:49redsheep[d]: Alright