00:00mohamexiety[d]: HdkR: Yeah, RDNA 4 onwards
00:05karolherbst[d]: none of it is automatic right? Like applications need to mark buffer as compression explicitly for the driver to enable it, right?
00:05karolherbst[d]: is there a thing for vulkan?
00:06mohamexiety[d]: no the whole point is all of it is automatic and transparent
00:07x512[m]: Is it possible to upload compressed buffer from CPU to reduce PCI bandwidth?
00:07mohamexiety[d]: (assuming we’re talking about bandwidth compression here)
00:08mohamexiety[d]: x512[m]: The host doesn’t know the layout so I don’t think so, but not actually sure if maybe there’s some way to cleverly do it
00:08karolherbst[d]: mohamexiety[d]: I mean compression buffers
00:08karolherbst[d]: *compressing
00:08karolherbst[d]: because you know..
00:08karolherbst[d]: buffers generally don't compress as well as images
00:10HdkR: Yea, it's fully transparent so other engines on the bus can use the compression as well.
00:10x512[m]: I have use case of uploading 8 bit fill masks that are greatly compressed with RLE-like algorithm.
00:10karolherbst[d]: HdkR: that's not what I meant tho
00:11karolherbst[d]: like you need to mark the memory as compressible, but if it can't be compressed, you pay a heavy penalty
00:11karolherbst[d]: so e.g. if you'd turn it on on a buffer that's nearly impossible to compress you'd see worse performance
00:14HdkR: From the chips and cheese article on it, it seems to talk about it being ubiquitiously enabled since all the hardware blocks just know how to consume it.
00:14HdkR: Or at least imply that it is
00:18HdkR: Granted I haven't looked at their PTE configurations to see if the bit can be toggled or not. Mining their kernel headers doesn't sound like fun :P
00:18karolherbst[d]: the point is, compression isn't magic
00:18HdkR: Definitely not
00:18karolherbst[d]: sure, all hw blocks can deal with it, because how it's implemented is that they don't have to know it
00:18karolherbst[d]: or rather, it happens in the memory unit
00:19karolherbst[d]: and the blocks consuming it just get the real data
00:20HdkR: https://old.chipsandcheese.com/2025/09/13/amds-rdna4-gpu-architecture-at-hot-chips-2025/ Has the cool deets. "compression" units past the infinity fabric. Compressed data living all the way down to L2 in the CUs. I like it :)
00:21mhenning[d]: I've actually been really confused about some of the details of compression.
00:21mhenning[d]: If the compression is lossless, then the pigeon hole principle mandates that certain inputs become larger, right? but where does that extra data go?
00:21mhenning[d]: do we allocate more memory for compressed images?
00:21karolherbst[d]: nope
00:22karolherbst[d]: the compressed data never hits VRAM afaik
00:23gfxstrand[d]: There are extra metadata pages in VRAM
00:23mohamexiety[d]: Yeah
00:24mohamexiety[d]: This is why it needed kernel patches (and why it’s impossible to do for older stuff because management of the metadata was done by firmware pre Turing)
00:24karolherbst[d]: yeah.. there is stuff in the page tables for it
00:24gfxstrand[d]: Everything that's compressed technically consumes *more* memory than uncompressed. But it's able to use less bandwidth most of the time.
00:25airlied[d]: I think the push for buffer compression comes from tensors
00:25gfxstrand[d]: Probably
00:25karolherbst[d]: yeah in theory, but I don't think that's how it works on nvidia
00:26gfxstrand[d]: Not sure what you're disagreeing with
00:26karolherbst[d]: using more memory
00:26gfxstrand[d]: Yes, it does
00:26mhenning[d]: gfxstrand[d]: okay, that makes sense
00:26mohamexiety[d]: It has to use more memory because of the extra metadata
00:26karolherbst[d]: I mean.. okay sure, but how much data is that?
00:26gfxstrand[d]: A few bits per GOB
00:27gfxstrand[d]: It's not much
00:27karolherbst[d]: right
00:27gfxstrand[d]: But it's enough to tag the "we couldn't compress so we just stuck it in uncompressed" case.
00:27gfxstrand[d]: Same thing Intel and AMD do.
00:29mohamexiety[d]: Yeah the way it works (at least on Ada) was that it touched 28 bytes per each half GOB (256B). That was with testing it on the absolutely trivial case of an image having a single value repeated throughout all its pixels
00:29mohamexiety[d]: So you don’t really need much per GOB
00:30karolherbst[d]: I wished nvidia would explain those things more publicly 😄
00:32gfxstrand[d]: 'twould be nice
00:32karolherbst[d]: how is your NDA situation going btw? 🙃
00:33gfxstrand[d]: If I were in Philly, I'd be bugging Neil about it in-person again.
00:34karolherbst[d]: heh
00:34gfxstrand[d]: We're at the "Everyone agrees. He just needs to hunt down lawyers" stage.
00:34gfxstrand[d]: :frog_upside_down:
00:35karolherbst[d]: took a while to get there lol
00:35karolherbst[d]: soo basically 10% there
00:37gfxstrand[d]: We'll see
00:38karolherbst[d]: I... have experience with nvidia lawyers and let's just say, they take their job in protecting Nvidia's IP very seriously
00:38karolherbst[d]: but I'm sure it will work out
00:46gfxstrand[d]: Yeah...
00:46gfxstrand[d]: Lawyers do be lawyerin'
03:21cubanismo[d]: Our lawyers seem to be best at not doing any work unless we absolutely force them to.
03:21cubanismo[d]: Which in most cases, probably is good lawyering
03:24HdkR: "Hey lawyers, I want to contribute back to this open-source project."..."But it would be so much easier if you just...don't do that."
04:06mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1419897765118476309/image.png?ex=68d36e57&is=68d21cd7&hm=212ad2c6bf35b0fb0a421f0c96dcd713a626707d33df5365e0616a5fac2449e9&
04:06mangodev[d]: ðŸ«
04:10mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1419898710132916305/image.png?ex=68d36f38&is=68d21db8&hm=a118fee4f9643b4be1b286ed1e02386b346fe75c82130b18b5252ea33b882f61&
04:10mangodev[d]: okay now this is just confusing
04:10mangodev[d]: is it because it's not in `/bin/wayland-scanner`?
04:17mangodev[d]: ik it's this commit, but it's more of meson's fault because it's meson's system
04:17mangodev[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35839
04:34kode54[d]: you need wayland-protocols as a makedep for that
04:35kode54[d]: I think? or is it in "wayland"
04:35kode54[d]: yeah, `wayland`
04:45mangodev[d]: i do.
04:45mangodev[d]: mangodev[d]: it's installed
04:45mangodev[d]: and i'm running a wayland system
07:27chikuwad[d]: okay, back to fl16-vec
07:28chikuwad[d]: I'm struggling to make heads or tails of this backtrace
07:28chikuwad[d]: https://pastebin.com/gY5kGfvx
07:28chikuwad[d]: though looking at it I _can_ tell the ssa is likely invalid?
07:28chikuwad[d]: at frame #1
07:41kestrelwx: Hi! I'm curious if #169 #406 #61 and #372 are the same issue given the shared chip.
07:46kestrelwx: Oh, #61 is something else I think, Sorry.
07:56Sid127: kestrelwx: I'm guessing you're referring to issues filed at drm/nouveau?
08:05kestrelwx: Yes, sorry.
08:06kestrelwx: Well seems they made it work in 2016, I'll check on an arch linux image from the time.
08:08chikuwad[d]: chikuwad[d]: hm no nvm the `nir_lower_mem_access_bit_sizes` pass is failing
08:14kestrelwx: I think I've managed to make it work using the parameters from 2016 issue.
08:16kestrelwx: [env] ~ $ DRI_PRIME=1 glxinfo | rg Device
08:16kestrelwx: Device: NV118 (0x1347)
08:18kestrelwx: nouveau.config=NvGspRm=1,NvClkMode=7 nouveau.runpm=0 nouveau.modeset=1 nouveau.noaccel=0
08:18kestrelwx: I guess I should ask the people who had this recently if this resolves it for them too.
08:21kestrelwx: DRI_PRIME=1 vulkaninfo | rg NVK GPU id = 1 (NVIDIA GeForce 940M (NVK GM108))
08:24Sid127: nouveau.config=NvGspRm=1 does nothing on that chip btw
08:24chikuwad[d]: chikuwad[d]: progress!
08:30kestrelwx: Oh, I assumed it'd work since it's Maxwell.
08:32kestrelwx: I see it's on Turing+.
09:22phomes_[d]: I could use some rust/nak help in https://gitlab.freedesktop.org/mesa/mesa/-/issues/13953
09:27snowycoder[d]: phomes_[d]: Huh, does the shader have more than 1600 CFG blocks?
09:47phomes_[d]: the shader is a bit wild. If I read the output of spirv-cfg correct then it has 1309
09:52snowycoder[d]: Well, that would explain it. :blobcatnotlikethis:
09:52snowycoder[d]: Lots of loop analysis passes in cfg.rs use DFS with recursive functions, a deep enough CFG is bound to crash them
09:56karolherbst[d]: I wonder if the graph is de-facto cyclic even though that's kinda illegal
10:01phomes_[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1419987087012401152/enshrouded.svg?ex=68d3c187&is=68d27007&hm=0d7b34dbf93effa7a2601d077c80c69d4075746878b7bcd744fbf638c6b2461c&
10:01phomes_[d]: it does not seem so to me from looking at graphviz
10:03snowycoder[d]: cfg.rs has lots of recursive functions. we should replace recursive algorithms with non-recursive ones, it should also help with compile-time perf
10:54klm`: Hi! I'm trying nouveau on my NVIDIA GTX 1060 3G [10de:1c02] and I'm getting "NOUVEAU(0): Error creating GPU channel: -19" and "NOUVEAU(0): Error initialising acceleration. Falling back to NoAccel" in Xorg.log. Reading https://nouveau.freedesktop.org/FeatureMatrix.html , shouldn't my card be supported?
10:58karolherbst: klm`: mind pastbining your dmesg?
10:58karolherbst: but also you shouldn't use the nouveau ddx at all at this point
11:16barryneedle: I suppose the multivalue variable length decoder can be done since the values nearing to infinity can be distinguished per each in contiguous set. But the parallel bank selection logic is on the way with complexity present so for 32bit it could be one million digits per bank. 4000 banks like this, so how would the bank be selected? say we aggregate the 8 fields together, every power has
11:16barryneedle: the same weight so all powers are equiprobable. Ok let's pretend we get a values minified into double encoded banks this way. so 3551th bank stores the answer of something, which has positive value which adds to another before it gets filtered with the access logic, so before you ever know the outcome of combining add, you already have to double the value i suppose, to leave the history
11:16barryneedle: behind for every bank for the next run which decodes all the banks in parallel and in internal logics in hw adds them. It is terribly long and complex but maybe possible, i no longer know i had given up i think, because i need to go to work to pay my expenses in life.
11:29ristovski[d]: man, that guy does _not_ get tired
11:55karolherbst[d]: it's been 10? or more years by now
11:59ristovski[d]: yeah >_>
12:10klm`: karolherbst: sure! https://pastebin.com/raw/WTnjrzmk (xorg) and https://pastebin.com/WTnjrzmk (dmesg)
12:10klm`: I'm using GNU Guix and it has loading nonfree-firmware disabled. Could that be it? I though nouveau wouldn't need nonfree firmware blobs but that may obviously be wrong.
12:12karolherbst: ohh yeah, you need firmware
12:13karolherbst: also you pasted the same link twice
12:13karolherbst: but anyway.. not using firmware is an unsupported configuration, so you'll have to ask GNU Guix for support on this one
12:18karolherbst: klm`: since Maxwell 2nd gem (~900 series) we require signed firmware blobs for acceleration, which kinda means without that you only have software rendering available. The modesetting DDX _should_ handle this gracefully, and no idea why you are using the nouveau ddx, is the GNU Guix doing? In which case it's another of their bugs. If you are having
12:18karolherbst: an xorg config, you should remove it as this is also unsupported in most cases unless you really know what you are doing.
12:33barryneedle: You as free software developers paid for your tasks and jobs imo should be rather inflicted to try or say tempted to try such accelerator in software, which would be rather valueable the most valued hunk in the software preferred to anything like wayland , it's horribly needed just i no longer have anything to continue to try programming that, what makes fewest sense is to trashbin
12:33barryneedle: electronics after every five years, despite the fact that they are functional, but no longer keep up with newer products, it's pretty sad feeling to just trashbin solid snb laptop from apple etc. We need something to avoid such silly phenomenom of artificial amortization.
12:56spookytrevor: what you have done during that ten years makes very few sense overall, there has been zero progress after KMS -- kernel modesetting which made fast vt switch to be possible, it was the last and only innovation AFAIK. Then someone at another hand did new sysV init scripts for userspace well lennart poettering at red hat, it's pretty pointless so you have moved some bits from there to there
12:56spookytrevor: then back and forth without any vision or strategy of improvement. It's just waste of time and resources. Like you were playing chess and moved one button back and forth , back and forth. Simply no point whatsoever to do such stuff. And real projects you do not want to attend in, you want to bully and harass others hotels and whatnot on top with your lethally stupid cowardish tyranny based
12:56spookytrevor: and dangerous acts for others ongoing lives.
13:19gfxstrand[d]: chikuwad[d]: Typically, that means someone forgot to set `b->cursor`.
13:19klm`: karolherbst: Oh, I see. What firmware does nouveau require, its own or nonfree firmware from NVIDIA?
13:20karolherbst: klm`: nonefree firmware being part of the official linux-firmware repository
13:20klm`: I thought the point of nouveau was to keep everything open source, including the firmware. But I'm probably missing something, and the project I'm sure, is more than ambitious as it is.
13:20karolherbst: well the firmware needs to be signed with a private key only nvidia owns
13:20klm`: Ok I see, thanks.
13:21chikuwad[d]: gfxstrand[d]: I was missing an NIR pass (but now it segfaults further down)
13:21chikuwad[d]: but, will look into that too, thanks :D
13:21klm`: karolherbst: Then it won't run on GNU Guix as they aim to run entirely without nonfree software.
13:22klm`: I have an older NVIDIA card which worked, so I though this might be the case on these newer cards too.
13:22gfxstrand[d]: phomes_[d]: Can you attach the SPIR-V or a single-shader fossil to the issue?
13:22karolherbst: sure, that's why I said it's a GNU Guix issue and they have to ensure the system still runs
13:23klm`: karolherbst: It intentionally doesn't run. But thanks for explaining, it's very much appreciated.
13:23karolherbst: like you should still be able to use your system with software acceleration and all that, but if a distribution makes changes putting them into an upstream unsupported configuration, they have to deal with the fallout
13:23gfxstrand: klm`: Yeah, if you're trying to run firmware-free you're stuck at Fermi and older
13:23karolherbst: klm`: not my point
13:23karolherbst: xorg and stuff still runs if you don't have hw acceleration
13:23karolherbst: just slower
13:23karolherbst: and if it doesn't, there is a problem with the configuration
13:24karolherbst: can't use the nouveau ddx e.g.
13:24karolherbst: (which you shouldn't use anyway)
13:24klm`: yes, I'm running without the firmware now and all screens are working and everything - so that's really great. I'll get an older card and just use it for the extra HDMI/DP connector that I need.
13:24klm`: gfxstrand: I see
13:25klm`: so now I'm looking for the cheapest Nvidia card with 2560x1440 support
13:25karolherbst: klm`: again, you should be able to do that even without firmware, you just won't get hw acceleration in userspace
13:26gfxstrand: You still need some hardware to drive the display
13:26karolherbst: yeah and that doesn't need firmware
13:26gfxstrand: ah
13:26esdrastarsis[d]: klm: You can use nonguix channel to install nonfree firmware
13:26karolherbst: the firmware is only really needed for context switching
13:26karolherbst: we even provided accelerated framebuffer on Ampere for a while before we got firmware there
13:26karolherbst: because using a single context was fine
13:26klm`: esdrastarsis[d]: I'd rather stick to free firmware on my Guix machine
13:27klm`: esdrastarsis[d]: but thanks for the pointer
13:27karolherbst: but this is just an issue with userspace messing up
13:27esdrastarsis[d]: klm: No problem
13:27karolherbst: the modesetting DDX should do just fine
13:27karolherbst: klm`: maybe just uninstall the nouveau ddx and it should just work (tm)?
13:27karolherbst: if so, might want to report it to GNU guix so they deal with it out of the box
13:28phomes_[d]: gfxstrand[d]: I attached the spir-v for two cases. I was unsure if we wanted those in gitlab or not
13:28klm`: karolherbst: I'm not following you. What issue will that solve?
13:29karolherbst: xorg not starting
13:29karolherbst: or wait..
13:29karolherbst: does it already start?
13:29klm`: karolherbst: oh, sorry, I've been unclear then. My system works fine
13:29karolherbst: ahh
13:29klm`: just no hw acceleration
13:29karolherbst: okay I see
13:30klm`: so I have what I need, I just won't be able to use my cards for rendering. I'll stick to my integrated graphics and use the nvidia card for the extra connector. This should work well.
13:30klm`: That was my setup these past few years and the nouveau drivers have been working flawlessly
13:31karolherbst: yeah.. well.. you won't get much performance out of this nvidia GPU anyway, because we lack the firmware to change the performance state and you'll be stuck on some lower perf level
13:31karolherbst: so the iGPU might be faster no matter what
13:43klm`: karolherbst: oh i see
13:45klm`: this: https://nouveau.freedesktop.org/FeatureMatrix.html makes it seem like things should work pretty flawlessly.
13:46klm`: I mean, since it says "DONE". Is it indicating that you get full performance or am I misreading that?
13:46karolherbst: `Power management`
13:49klm`: oh my gosh, there it is …
13:50klm`: hah, I can't believe I missed that
13:51klm`: how can I find out the max resolution a chip supports? I have a GeForce 210
13:51klm`: I know this may not be the right place to ask that, though
13:51karolherbst: uhhh good question, a lot of factors depend on it
13:52karolherbst: detection with hdmi was never perfect, but the hw specs should be good enough (tm)
14:36gfxstrand[d]: karolherbst[d]: Why are `NVK_MME_SCRATCH_FALCON_0/1/2` all set to 0?
14:36gfxstrand[d]: Is that right?
14:36gfxstrand[d]: I mean, falcon works so...
14:36gfxstrand[d]: But that feels funky
14:36karolherbst[d]: goooood question
14:37karolherbst[d]: I mean.. some of nvidia's macros do that
14:37karolherbst[d]: maybe some firmware internal thing, who knows
14:37gfxstrand[d]: macros do what?
14:37karolherbst[d]: like `mme_set_priv_reg` uses scratch 0
14:37karolherbst[d]: and it's read by the firmware to do the thing afaik
14:38karolherbst[d]: (and 1 and 2)
14:38karolherbst[d]: ohh you mean in the enum?
14:38gfxstrand[d]: Yeah
14:38karolherbst[d]: I guess so that nothing uses them
14:38gfxstrand[d]: We say 0, 1, and 2 are reserved and then set them all to 0
14:39gfxstrand[d]: And by "set them" I mean FALCON_2 = 0 in the enum
14:39karolherbst[d]: right...
14:39karolherbst[d]: I mean it's your code 😛
14:40karolherbst[d]:but
14:40gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_mme.h#L50
14:40karolherbst[d]: I think you wanted to set them to 1 and 2
14:40karolherbst[d]: because now the cs invocation thing overlaps with them logically
14:40gfxstrand[d]: karolherbst[d]: Damnit! So it is. 😂
14:40karolherbst[d]: yeah
14:40karolherbst[d]: I mean
14:41karolherbst[d]: I see why you did this, but I think you wanted to use 1 and 2 😄
14:41gfxstrand[d]: Yeah
14:41karolherbst[d]: or...
14:41karolherbst[d]: dunno
14:41karolherbst[d]: use 0 as a canary and start the list with 3
14:41karolherbst[d]: and assert it's not used or whatever
14:42luc: where should i get the GPU sclk/mclk? i am using nouveau on GP108 (MX250). cat `/sys/kernel/debug/dri/0000:01:00.0/pstate` just shows '/sys/kernel/debug/dri/0000:01:00.0/pstate': No such device
14:45mohamexiety[d]: yeah reading the commit's description, it feels like it should have been 0 1 2
14:46mohamexiety[d]: (since it calls out the first 3 registers)
14:47marysaka[d]: yeah it uses the first 3 values
14:48marysaka[d]: some of those falcon method write values for priv regs ect
14:49luc: i know that gp108 can't get reclocked due to signed firmware, but at least initial frequency should be seen somewhere i guess
15:03gfxstrand[d]: mohamexiety[d]: I attached a patch to your MR to include in the next round.
15:03mohamexiety[d]: yup picked it up
15:04stanstotev: well the formula to get the banks isn't such a hard job, you would naturally get there with linear algebra 57+57+base=114+base , so when you remove this 270and269 from 46 with the quadro arithmetic you get 21 which from 78 removed is 57 , so another 57 added and a base, now gets to 114+base as told, let's pretend other operand is the same so 46 from another would yield twice of 57, now this
15:04stanstotev: removed from constant of higher bank sum yields the needed bank address of a first result. for an example if you had 4000 of them one from 4000 is forwarded if you used the needed linear algebra, in other words every core of ALU is global so in theory the compilation is tremendous fast and uniform so the core alu scaffold never changes once those magic values get assembled. But i already
15:04stanstotev: know i am incapable to provide that without getting paid. after such example you would decode the intermediate banks with multivalue decoder into a higher value in a range and next up you would run the hash with real results gotten. I think i am capable of providing in one year altogether with every testing of the compiler, but currently i have ran out of money and need to go to work that
15:04stanstotev: get's me paid too.
15:23mohamexiety[d]: gfxstrand[d]: all done and added, review comments taken care of too
15:24gfxstrand[d]: mohamexiety[d]: Then CTS and merge
15:25mohamexiety[d]: I cant merge but will CTS
15:25gfxstrand[d]: You can't merge?!?
15:26mohamexiety[d]: yeah I dont actually have ability to assign to marge which is why for stuff I do review (e.g. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37385#note_3100540) I just leave it after for someone else to pick up
15:26gfxstrand[d]: File an issue requesting access
15:26gfxstrand[d]: Then I'll add youi
15:28mohamexiety[d]: gfxstrand[d]: here, right? https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues
15:29gfxstrand[d]: on mesa/mesa
15:29gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/?sort=created_date&state=opened&label_name%5B%5D=Project%20Access%20Requests&first_page_size=20
15:32mohamexiety[d]: yup, done
15:32mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13957
15:33gfxstrand[d]: Done
15:34zmike[d]: wtf I didn't even get to +1
15:34gfxstrand[d]: You can +1 after the fact
15:34zmike[d]: no I can't
15:34zmike[d]: you've robbed me of that chance
15:34gfxstrand[d]: Then just +1 instead
15:34zmike[d]: I cannot
15:34zmike[d]: the buttons no longer work
15:34karolherbst[d]: hey, I wanted as well
15:34zmike[d]: they have no meaning
15:35gfxstrand[d]: Feel free to re-open, +1, and close. 😛
15:35zmike[d]: nobody feels that free
15:45mohamexiety[d]: thanks! dusting off my deqp-runner script and will run CTS then try it out
15:59mhenning[d]: chikuwad[d]: If you're not already doing this, I recommend building with assertions on when you're debugging stuff like this. That will turn on some additional nir validation which will make it easier to see where certain kinds of bugs happen.
15:59chikuwad[d]: I'm building with asserts enabled, yeah
16:57pavelnestev: You know the difference would be that the alu answer sets would have the different PC offset , i.e shifted answers, but those are all stored in a machine word for one ALU, so compilation is all about just a data access of version PC2 of MUL answer set from mul word, then adding it to the program tape so, it would end up peing sw ASIP type of accelerator but compilation and operation are
16:57pavelnestev: both very very cheap and fast. so the skeleton of that arithmetic can be programmatically coded and generated in current software, but current software state has maybe your favors, but to be honest linux has been in the same state for decades already it has been somewhat successfully supporting hardware before you already, if you declare now that your results are bad cause of a spammer
16:57pavelnestev: called mart, one day you will say, something happened to me, only cause crow shitted on the window. So i have hard time to listen to your absurdish tyranny.
17:41klm`: karolherbst: thanks, I will dig it up and give this old card a try at 2560x1440
17:42gfxstrand[d]: Ugh... Trying to remember the best way to run a single SPIR-V file
17:46snowycoder[d]: gfxstrand[d]: If you are running the enshrouded shader build error I have a hacky branch that removes the cfg.rs recursive code.
17:46snowycoder[d]: I haven't been able to CTS it since I'm still at work but it might be useful (it might also improve compile time?)
17:47gfxstrand[d]: For me it's getting stuck in liveness
17:47gfxstrand[d]: Nope. It compiles
17:48gfxstrand[d]: IDK what the max stack size ends up being
17:49snowycoder[d]: In my tests cfg.rs code can handle up to 10k CFG blocks (using unit tests) before crashing, but that limit is a bit close to real shader stats
17:49snowycoder[d]: Also, what are you using to run a single SPIR-V?
17:50gfxstrand[d]: nvdump in nv-shader-tools
17:50gfxstrand[d]: It blows up on the NVK binary but it compiles the SPIR-V
17:50gfxstrand[d]: Looks like this one has 11k blocks
19:04gfxstrand[d]: Trying to make the DFSs in cfg.rs not use recursion and ugh...
19:04snowycoder[d]: Wait, I have some code you can use
19:05gfxstrand[d]: oh?
19:06snowycoder[d]: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/merge_requests/new?merge_request%5Bsource_branch%5D=nak_remove_cfg_recursion
19:06snowycoder[d]: Wrong link
19:07snowycoder[d]: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/tree/nak_remove_cfg_recursion?ref_type=heads
19:07snowycoder[d]: It passes all unit tests, can't check integration because I'm booted with nvidia-proper and I'm doing benchmarks for work
19:09gfxstrand[d]: I think that silently extends some DFS to BFS
19:09snowycoder[d]: It shouldn't, I've always used a stack
19:09gfxstrand[d]: Yeah, but you push all the children onto the stack together
19:10gfxstrand[d]: Hrm... Maybe that works. I need to think
19:10snowycoder[d]: Only for reaches_dfs where the order doesn't really matter
19:12snowycoder[d]: The others should emulate a call stack properly, with weird state-machine like code -.-
19:12snowycoder[d]: We might abstract it into a more generic construction, I don't know which method would be easier to read
19:13gfxstrand[d]: I'm looking into an `DepthFirstSearch` trait
20:55mangodev[d]: mangodev[d]: is it mesa or the kernel that's acting up? over the past couple days, my drivers have been destroying themselves and i have no clue why
20:56mangodev[d]: the other day, i noticed atomic modeset stopped working
20:56mangodev[d]: and after today's kernel update, nouveau itself completely stopped working, and my system is now running off of llvmpipe
20:57mangodev[d]: and my system has no lib32 wayland-scanner, so i can't rebuild mesa ðŸ«
21:02mangodev[d]: i think the latest kernel versions screwed things up
21:04mhenning[d]: oh, weird I'm not seeing nvk out of my devenv either, although it seems to work in the devenv
21:05mangodev[d]: i think 6.16.8 broke it for me
21:05mangodev[d]: because .7 was working great
21:05mangodev[d]: .8 is now using llvmpipe
21:06mhenning[d]: yeah, I might be seeing the same thing. I'll try to root cause
21:07mangodev[d]: i'm also on an older mesa version because i can't build it since yesterday
21:07mangodev[d]: i rolled back linux and linux-headers
21:08mangodev[d]: damn
21:08mangodev[d]: still llvmpipe
21:08mangodev[d]: maybe it was linux-firmware-nvidia
21:10mangodev[d]: is it okay to roll back a single firmware package, or should i go through the effort to roll back all of them?
21:12mangodev[d]: i'm doing all of them just for safety
21:12mhenning[d]: mangodev[d]: Okay, it looks like this is just a libdisplay-info update. nvk needs to be recompiled against the new version
21:13mangodev[d]: mhenning[d]: but i can't recompile :(
21:13mangodev[d]: my build breaks on lib32
21:13mhenning[d]: well then switch to the distro package until you can compile again
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:13mohamexiety[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:13mohamexiety[d]: is it normal for these to timeout in a parallel deqp-runner run but pass in solo runs?
21:14mohamexiety[d]: they do take a long time (~1min) each but do pass eventually
21:16mhenning[d]: maybe? I set `timeout = 240` in the toml. I forget which tests I was adjusting the timeout for
21:16mohamexiety[d]: oooh I dont do that
21:16mohamexiety[d]: alright then noted. thanks!
21:17mhenning[d]: Yeah, I don't normally worry too much about the timeouts as long as they eventually pass
21:17mohamexiety[d]: Writing test log into TestResults.qpa
21:17mohamexiety[d]: dEQP Core unknown (0xcafebabe) starting..
21:17mohamexiety[d]: target implementation = 'Default'
21:17mohamexiety[d]: Test case 'dEQP-VK.info.device_extensions'..
21:17mohamexiety[d]: Fail (Unknown extension VK_KHR_maintenance9)
21:17mohamexiety[d]: this one is a bit funny, but I think it's just a CTS bug (1.4.3.3)
21:17mangodev[d]: mhenning[d]: how? my libdisplayinfo package hasnt updated in months
21:17mhenning[d]: mohamexiety[d]: Oh, yeah that's probably just your cts being too old
21:17gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536
21:18gfxstrand[d]: gfxstrand[d]: snowycoder[d] ^^
21:18mohamexiety[d]: mhenning[d]: this is the latest release :KEKW:
21:18gfxstrand[d]: mohamexiety[d]: Too old!
21:18gfxstrand[d]: Just ignore that test
21:18gfxstrand[d]: Your CTS is always too old
21:18mhenning[d]: mangodev[d]: You're on arch, right? https://archlinux.org/packages/extra/x86_64/libdisplay-info/ lists it as being updated yesterday
21:18gfxstrand[d]: You can take that as an axiom
21:19mohamexiety[d]: haha
21:19gfxstrand[d]: I've fought with people over that darn test for YEARS. CI people keep trying to turn it on in CI and then complaining when it "regresses" because someone added a feature.
21:20mohamexiety[d]: yeah makes sense
21:20mohamexiety[d]: looks like these always fail on blackwell due to modifiers:
21:20mohamexiety[d]: dEQP-VK.wsi.acquire_drm_display.acquire_drm_display_invalid_fd,Fail
21:20mohamexiety[d]: dEQP-VK.wsi.acquire_drm_display.acquire_drm_display_not_master,Fail
21:20mohamexiety[d]: dEQP-VK.wsi.acquire_drm_display.get_drm_display,Fail
21:20mohamexiety[d]: dEQP-VK.wsi.acquire_drm_display.get_drm_display_not_master,Fail
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.colorspace.basic,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.colorspace.hdr,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.colorspace_compare.r5g6b5_unorm_pack16,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.fifo.identity.opaque.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.fifo.identity.opaque.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.fifo.identity.pre_multiplied.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.fifo.identity.pre_multiplied.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.immediate.identity.opaque.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.immediate.identity.opaque.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.immediate.identity.pre_multiplied.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.immediate.identity.pre_multiplied.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.mailbox.identity.opaque.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.mailbox.identity.opaque.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.mailbox.identity.pre_multiplied.incremental_present,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.incremental_present.scale_none.mailbox.identity.pre_multiplied.reference,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.swapchain.create.image_format,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.swapchain.private_data.image_format,Crash
21:20mohamexiety[d]: dEQP-VK.wsi.wayland.swapchain.simulate_oom.image_format,Crash
21:22mangodev[d]: mhenning[d]: strange
21:22mangodev[d]: i'm just rebuilding mesa from before the meson changes, hopefully it should run?
21:23mangodev[d]: whyyyy libdisplay-info
21:23mangodev[d]: you add features i can't even use
21:23gfxstrand[d]: mohamexiety[d]: Uh oh...
21:25gfxstrand[d]: mohamexiety[d]: I have WSI blocked out for my CTS runs. 😕
21:25mohamexiety[d]: ah I should do that too
21:26mohamexiety[d]: not sure if these are new but they fail on main:
21:26mohamexiety[d]: dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.input_args.reflect_denorm_flush_to_zero,Fail
21:26mohamexiety[d]: dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.reflect_denorm_flush_to_zero_frag,Fail
21:26mohamexiety[d]: dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.reflect_denorm_flush_to_zero_vert,Fail
21:29mhenning[d]: Oh, yeah dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.input_args.reflect_denorm_flush_to_zero fails for me on main too on ampere
21:30mhenning[d]: passed on my previous cts run though
21:30mhenning[d]: I'll bisect
21:33phomes_[d]: gfxstrand[d]: this is looking really good so far. It fixed those crashes I had. I will give it a test on a bunch of other games
21:33mohamexiety[d]: aside from that, compute MME MR passes all CTS except these:
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.a2b10g10r10_uint_pack32,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.a2b10g10r10_unorm_pack32,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.b10g11r11_ufloat_pack32,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16_unorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16_unorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16b16a16_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16b16a16_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16b16a16_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16b16a16_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r16g16b16a16_unorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32b32a32_sfloat,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32b32a32_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r32g32b32a32_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8_unorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8_unorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8b8a8_sint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8b8a8_snorm,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8b8a8_uint,Fail
21:33mohamexiety[d]: dEQP-VK.image.device_scope_access.comp_comp.1d.r8g8b8a8_unorm,Fail
21:33mohamexiety[d]: (those pass on main, so it's something to do with that MR)
21:34gfxstrand[d]: phomes_[d]: If you're seeing other games OOM, it might help.
21:36gfxstrand[d]: There's another one in the phi repair pass which I manually converted manually a while ago. IDK that it's worth switching it to the trait.
22:06snowycoder[d]: gfxstrand[d]: Does it improve compile times? It should also help theoretically
22:06gfxstrand[d]: No idea
22:08snowycoder[d]: gfxstrand[d]: Thinking about lifetimes is making my head hurt, but the code is really nice!
22:08snowycoder[d]: (Doesn't ChildIter hold a read reference to dfs while we call `.edge` that has a `&mut` ref?)
22:08mhenning[d]: snowycoder[d]: I'd actually expect it to be slower - pushing to a vec is slower than incrementing the stack
22:08gfxstrand[d]: Yeah, but it's less data being pushed because we have a single reference to all the captured stuff for the whole DFS instead of repeating it for every level.
22:09gfxstrand[d]: snowycoder[d]: That's a good point!
22:09gfxstrand[d]: I have no idea why Rust is letting me get away with that.
22:09snowycoder[d]: But it still compiles so I guess it's ok?
22:10gfxstrand[d]: I think it's because `ChildIter` is only holding a non-mutable reference to the `nodes`, which is already a non-mutable reference within `WhateverDFS`
22:11gfxstrand[d]: It doesn't actually hold a reference to `WhateverDFS` itself. So as long as its lifetime doesn't grow beyond `WhateverDFS`, which is guarding the lifetime of `nodes`, it's valid.
22:12gfxstrand[d]: You can always re-borrow a non-mutable reference, as long as you don't extend the lifetime
22:29mohamexiety[d]: mohamexiety[d]: ```
22:29mohamexiety[d]: // m_testType == TC_COMP_COMP
22:29mohamexiety[d]: // First compute shader executes that will flip the input image, image0, horizontally in output image, image1.
22:29mohamexiety[d]: // First compute shader makes image1 available in the shader domain with device scope barrier.
22:29mohamexiety[d]: // Pipeline barrier executes that adds execution dependency: second shader will only execute after the first has completed.
22:29mohamexiety[d]: // Then second compute shader executes making image1 visible in the shader domain and copies image1 back to image0 without any change.
22:29mohamexiety[d]: // image0 is compared to the reference output that should be flipped horizontally.
22:29mohamexiety[d]: hm I see
22:30mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1420175537867063378/image.png?ex=68d47109&is=68d31f89&hm=9eb8c2e846498144eb6d114f058f00eba05c80b081629961cd8f36e5e22c6a36&
22:30mohamexiety[d]: looking at the CTS fails, the output we get is just completely blank
22:37mohamexiety[d]: it's weird how it fails only for 1D though
22:37mhenning[d]: oh, that could be related to the sync bug I'm looking at
22:37snowycoder[d]: gfxstrand[d]: Ohhhh that makes a lot of sense actually, thanks!
22:37mohamexiety[d]: mhenning[d]: interesting. but this only shows with the compute MME MR :thonk:
22:37mohamexiety[d]: on main it passes
22:40mhenning[d]: Yeah, but the compute mme stuff removes some extra implicit synchronization
22:40mohamexiety[d]: oh right less WFIs
22:40mhenning[d]: Try cherry-picking this to see if it helps: https://gitlab.freedesktop.org/mhenning/mesa/-/commit/77257be41f4727773096f20fb9bb9e0c7588a28a
22:42mohamexiety[d]: mhenning[d]: yep, awesome work!
22:42mohamexiety[d]: DONE!
22:42mohamexiety[d]: Test run totals:
22:42mohamexiety[d]: Passed: 39/78 (50.0%)
22:42mohamexiety[d]: Failed: 0/78 (0.0%)
22:42mohamexiety[d]: Not supported: 39/78 (50.0%)
22:42mohamexiety[d]: Warnings: 0/78 (0.0%)
22:42mohamexiety[d]: Waived: 0/78 (0.0%)
22:43mohamexiety[d]: will hold off on the compute MME MR till that gets taken care of then. thanks!
22:44mhenning[d]: Okay, cool. I'm tracking that particular issue in https://gitlab.freedesktop.org/mesa/mesa/-/issues/13909
22:46mohamexiety[d]: alright
22:52mhenning[d]: mohamexiety[d]: I filed https://gitlab.freedesktop.org/mesa/mesa/-/issues/13961 for this