01:00airlied[d]: karolherbst[d]: what's left to remove the draft from the initial coop matrix?
02:21orowith2os[d]: mangodev[d]: How the dx12 api is implemented
02:22orowith2os[d]: And the features it uses
02:22orowith2os[d]: D3D is pretty picky, and you're implementing one explicit graphics API on top of another. There's bound to be conflicts in interest
02:24orowith2os[d]: Anything other than that is just driver optimizations
02:31orowith2os[d]: I'm sure faith and friends can give more details on all that
02:35orowith2os[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12506
02:35orowith2os[d]: This should also be of interest
02:39gfxstrand[d]: mangodev[d]: Descriptors. I'm planning to do a whole XDC talk on that this year if gets accepted.
02:41gfxstrand[d]: mhenning[d]: Yeah, I wondered about that, too. Most of the time there aren't more than a handful of texture ops in a shader.
02:48mhenning[d]: yeah, but also you'd expect an array of u8s to vectorize really well. so which is faster, doing more work and vectorizing or doing less work and branching more? hard to guess
03:47tiredchiku[d]: yeah, iirc there's a big descriptor buffer overhead on nvidia hardware
03:47tiredchiku[d]: re: dx12
03:48tiredchiku[d]: the hardware is actually fine with d3d12 running natively, it's when the translation to vulkan happens that things go boom
03:49tiredchiku[d]: you can notice perf drops if you use vkd3d-proton on windows, for example
03:50tiredchiku[d]: the native d3d12 driver is fine, but the vulkan doesn't map to the hardware as efficiently
03:50damo22: could that be a wine issue?
03:50tiredchiku[d]: no
03:50tiredchiku[d]: no wine involved on windows :)
03:51damo22: ah winblows
03:51tiredchiku[d]: (since the perf gap exists there too)
03:51tiredchiku[d]: I'm just talking about the hardware not meshing as well with vulkan as it does with d3d12
03:51kar1m0[d]: tiredchiku[d]: Well the proprietary drivers somehow manage to make it work
03:52kar1m0[d]: Not as good as native but still
03:52tiredchiku[d]: yes, because they're mature drivers that have had decades of work into them :p
03:52kar1m0[d]: tiredchiku[d]: On linux?
03:53tiredchiku[d]: yes
03:53tiredchiku[d]: nvidia drivers share a lot of the codebase between operating systems
03:53damo22: i thought vkd3d-proton was a fork of the wine vkd3d driver?
03:54tiredchiku[d]: damo22: correct, but it's become the de-facto d3d12-to-vulkan layer for heavier applications
03:54tiredchiku[d]: wine's vkd3d is still very barebones, iirc
03:55damo22: right
03:55mangodev[d]: gfxstrand[d]: ooooh nice
03:55mangodev[d]: can't wait for the day that releases :D
03:55mangodev[d]: tiredchiku[d]: is that why nvidia has distanced themselves from dx12 and has since went all-in on vulkan?
03:56tiredchiku[d]: they have?
03:56damo22: so youre running something forked from linux that replaced windows that now runs on windows?
03:56damo22: my lord
03:57tiredchiku[d]: damo22: I was just pointing out that it's a difference between the nvidia dx12 driver (which only runs natively on windows) and the vulkan driver that causes the perf gap
03:57tiredchiku[d]: I personally haven't used windows in 6 years
03:58tiredchiku[d]: the windows example was just to eliminate other variables i.e. OS differences
03:58damo22: im just stirrin
04:00damo22: i should buy a gpu for LLM but i dont want to fork out lots of dough
04:01damo22: and all my machines are corebooted
04:04damo22: whats the biggest VRAM supported asic for nouveau?
04:07airlied[d]: what's the biggest one you can get?
04:07damo22: yeah
04:07airlied[d]: like I'm sure the 48GB Ada ones work
04:08damo22: oh ok, yea thats a bit out of my budget unfortunately :D
04:09damo22: bunnings sell them ??
04:09damo22: LOL
04:10kar1m0[d]: airlied[d]: It's weird that when I tried running wukong it told me that I don't have enough vram
04:10kar1m0[d]: Even though I played wukong before just fine on nvidia drivers
04:13airlied[d]: probably just hardcoded for something nvidia does
04:14damo22: ~AUD$13k for a computer component is ridiculous
04:14mangodev[d]: tiredchiku[d]: given the vulkan extensions and the more recently sponsored games using vulkan
04:14mangodev[d]: although i could be wrong, nvidia and microsoft are both not very predictable companies
04:19tiredchiku[d]: OH I meant more that there is a descriptor buffer overhead when translating d3d12 to vulkan
04:43orowith2os[d]: damo22: vkd3d-proton is just a set of dlls, nothing stops them from being used on Windows
04:43orowith2os[d]: Same reason you can just install other Windows dependencies
04:43orowith2os[d]: Like dotnet and the DirectX runtimes
04:44orowith2os[d]: And you see people (and GPU manufacturers!) using dxvk on windows
05:33karolherbst[d]: airlied[d]: nothing really, should probably do final cleanups and undraft
05:36damo22: orowith2os[d]: still, only the braindead use windows willingly, having a hardware TPM EOL the machine just because they want you to buy a new machine ought to be outlawed
05:44karolherbst[d]: airlied[d]: maybe fixed scheduling information might help, but...
05:56karolherbst[d]: but the problem is it needs review 😄
06:25airlied[d]: We don't review drafts very much
06:29snowycoder[d]: mhenning[d]: debugoptimized, I haven't checked release mode directly
06:48karolherbst[d]: airlied[d]: well it's undrafted now anyway
09:55karolherbst[d]: why is it doing that?!? https://gist.githubusercontent.com/karolherbst/b2054c27c9180a54e14939fe81a53cca/raw/20ff1777fa165dfb4ee321dd5531067867a082e6/gistfile1.txt
09:56karolherbst[d]: *sigh*
09:58karolherbst[d]: those aren't even used in a phi...
10:02karolherbst[d]: huh.. maybe I broke something...
10:08karolherbst[d]: uhh.. I see what it's doing
10:09karolherbst[d]: `gpr_limit` is set to 80 and it kinda hurts allocating vectors...
10:10karolherbst[d]: `Instruction count: 794` uwu
10:11karolherbst[d]: it can be better...
10:12karolherbst[d]: `Instruction count: 781` heh
10:14karolherbst[d]: now I'm finally like 5% faster than the original code 😄
10:18karolherbst[d]: okay.. the remaining par_copy added seem to be something not working as good as it could be in my phi handling
13:32problemsno: It's the back-end of execution that i am dealing with this year.
13:39problemsno: So far haven't met real issues, as the concept seems to be functional. I configured a modern IRC client , a couple of those , it seems fun.
13:45problemsno: The state of linux like electron apps and nuget like csharp platforms is actually pretty fun stuff to work with, but we would have some heavy lifting performance wise yet to do, it's not that i am so cranky about this, snowycoder[d] your ideas seem to make sense , man you can code it much seems so, but if you choose projects outside of my frames of performance guidelines they just would not perform well.
13:48problemsno: I am more cranky as to how society treats me, rather than saying that computer apps are in the worst state ever you know. But in the end i think we share opinion there, it isn't that fascinating to have robots from vaccines inside of your organism that commit malicious events like harvesting cells or shrinking my dick etc. I think in the corner of your minds , you understand what i have gone through, that it is hard to tolerate su
13:57newiplease: we have to stand against such events together, and please do not take decisions for me, who i should be like i should tolerate a courtesan slut on my own territory who takes over my dads businesses with their filthiest ever gang stalker humiliators , this couple of lesbians was just too debiliated set to me, i will use force against them if they ever show up again on wront territory.
13:59newiplease: I am not against peoples own decisions and what you are is your chanche to choose, i care none whether the couple is bi or lgbt or lesbians, as long as they do not enter my territory, they committed such terror and fraud, that i won't accept this.
14:00newiplease: all those humiliating and scamming cranks and brutes are associated with lifetime ban on our territory
14:00newiplease: they ruined all minutes of my stay in asia.
15:49mhenning[d]: karolherbst[d]: It's possible that's because it's running out of spaces to allocate vectors, in which case https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35245 might help
15:49karolherbst[d]: yeah, that's what's happening
15:49karolherbst[d]: let me try that MR 🙂
15:53karolherbst[d]: mhenning[d]: another idea: allocate 3, 7, 11, 15, etc.. first, so a vec3 could fit, but not sure if "vec3" even exist as a concept. But global load/stores and some tex operations might benefit
15:54karolherbst[d]: also.. what to do about " no method named `count_unset_groups_of_4` found for struct `BitSet` in the current scope"?
15:54karolherbst[d]: ohh wait
15:54karolherbst[d]: I only picked the top commit 🙃
15:56mhenning[d]: karolherbst[d]: global doesn't use vec3s but I think the texture ops can in some cases. could be worth trying
15:57karolherbst[d]: ohh it was `AL2P`
15:57karolherbst[d]: and `ALD`
15:58karolherbst[d]: and `AST`
15:58mhenning[d]: oh yeah, I'm not sure if we use the 96-bit versions of those but they do exist
16:00karolherbst[d]: so without my phi/vec stuff your MR helps a little `2846` => `2838` static cycle count, but no other change
16:02karolherbst[d]: with my stuff it increased instruction use, but that might be randomness... or maybe something interacts very weirdly
16:05gfxstrand[d]: mhenning[d]: We do but they don't happen often enough to optimize for.
16:07karolherbst[d]: I think it triggers a bug in my stuff more aggressively
16:08karolherbst[d]: I identify `%r1578 => {%r177 %r178}.0` and `%r1579 => {%r177 %r178}.1` matching across phis, but then `%r1578 = copy rZ` => `r28` and `%r1579 = copy rZ` => `r31` which is clearly not a great thing to do 🙃
16:09karolherbst[d]: should be `r28` and `r29` obviously
16:09karolherbst[d]: this happens a few times in a row (28, 31, 34, 37, 40, 43, etc...) and then all the vec slots are gone
16:10karolherbst[d]: and now I need 210 instead of 130 regs 😄
16:36karolherbst[d]: https://gist.githubusercontent.com/karolherbst/b536b173a762931b66604273877162fe/raw/c80f33218b6b1f54e5af94ea0b9a5b6a1095b5a5/gistfile1.txt nice shader
16:40karolherbst[d]: airlied[d]: though no idea if it's correct, but I think it's starting to look pretty good
16:41kar1m0[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1379862713492307978/image.png?ex=6841c8cb&is=6840774b&hm=6512365eec004993f3a59c49f3decd8166ebe18522392da7a4008a69e1da3c95&
16:43karolherbst[d]: I should clean up that code tomorrow
16:44karolherbst[d]: 38 -> 42 TFlops now with being smarter about fp16 vectors in phis