01:37fdobridge_: <airlied> @gfxstrand https://people.freedesktop.org/~airlied/nvk-cts-1.3-full.tar.xz (it's ~100MB)
04:18graphitemaster: HdkR, current stats of things https://gist.github.com/graphitemaster/d21ee546a1316fe55a7a040b7e3f4648 XD
04:18graphitemaster: *state
04:19graphitemaster: Just tested and validated on 4 GPUs and 4 drivers seems to work fine. I just realized I have a worse problem
04:21graphitemaster: OpenGL, Vulkan, and Metal all have no real way to communicate across threadgroups but a device-level scan is genuinely useful and so a lot of things with undefined behavior (Nanite) assume they can.
04:22graphitemaster: The specific algorithm in this case is chained scan with decoupled lookback for a single-pass prefix-sum
04:23graphitemaster: So I'm trying to find a pretty solid way to implement such a primitive that works everywhere now. HLSL has globallycoherent which helps a lot. GL has coherent but it's not global as far as I can tell.
05:09HdkR: nice
11:14fdobridge_: <!DodoNVK (she) 🇱🇹> ~~NVK 8K gaming when?~~
11:30fdobridge_: <Sid> when we get DSC working
13:42fdobridge_: <zmike.> I get good perf with it on ANV/RADV/NVIDIA, so I'd guess there's an issue somewhere other than the game's renderer
14:22fdobridge_: <!DodoNVK (she) 🇱🇹> So I guess NVK is far from the micro-optimization stage :amd:
15:29fdobridge_: <triang3l> I wonder if premature pushing of things into the common runtime is going to end up being the root of some evil here along the way :frog_gears:
15:31fdobridge_: <triang3l> I think, by the way, I eventually need to run some performance tests for lots of bitscans and indirect calls (what I'm using currently) vs. just lots of BITSET_TESTs in Terakan
15:31fdobridge_: <triang3l> I think, by the way, I eventually need to run some performance tests for bitscans and indirect calls (what I'm using currently) vs. just lots of BITSET_TESTs in Terakan (edited)
15:31fdobridge_: <triang3l> I think, by the way, I eventually need to run some performance tests for bitscans and indirect calls for dirty state flags (what I'm using currently) vs. just lots of BITSET_TESTs in Terakan (edited)
15:31fdobridge_: <triang3l> I think, by the way, I eventually need to run some performance tests for bitscans and indirect calls for dirty state flags (what I'm using currently) vs. just lots of BITSET_TEST ifs in Terakan (edited)
15:43fdobridge_: <triang3l> maybe the latter would even be faster with branch prediction and no function call overhead, even if that means having lots of test/jz pairs for every draw (though a few dozens are not really "lots", I think)
15:45fdobridge_: <triang3l> for static state in pipelines I think my callback array approach is probably definitely terrible since it's more common there for nearly all state to be static, so lots of loop and function call overhead, should definitely throw that away
15:50fdobridge_: <triang3l> I wish we could have some cross-platform JIT for vkCmdBindPipeline… LLVM? :happy_gears: (Although that's not needed for drivers that might be fine with just emitting the hardware packets directly and just precompiling the final packet sequence, rather than writing the state to some intermediate structure first. Maybe RADV in theory to some extent with indirect context/uconfig register setting)
16:07fdobridge_: <prop_energy_ball> vkCmdBindPipeline should not be your bottleneck...
16:07fdobridge_: <prop_energy_ball> I've never seen that in perf at all
16:08fdobridge_: <prop_energy_ball> It's not really worthwhile talking about optimizing stuff until you have metrics that it would actually be impactful
16:08fdobridge_: <prop_energy_ball> In the past I did a descriptor template JIT for Mesa, and it turned out to be basically the same perf...
17:38fdobridge_: <gfxstrand> Having conferences at Google offices is awesome. The WiFi here is amazing!
17:39fdobridge_: <phomes_> Vulkanised?
17:39fdobridge_: <gfxstrand> Yup
17:40fdobridge_: <gfxstrand> Yeah, this. I have never seen any of that stuff actually show up as a CPU bottleneck ever. The only pipeline binding thing I ever really saw show up was an issue with push constant and descriptor flushing on Intel and you could only hit that with absurdly mean microbenchmarks.
17:41fdobridge_: <gfxstrand> And even that was totally solvable in the driver with the current runtime architecture.
17:58fdobridge_: <rinlovesyou> On doom 2016? what card do you have?
18:01fdobridge_: <gfxstrand> I suspect there's something really dumb we're doing with `vkGetQueryPoolResults()` or something like that.
18:01fdobridge_: <rinlovesyou> zink + nvk looks like this on the game :Hehe:
18:01fdobridge_: <rinlovesyou> https://cdn.discordapp.com/attachments/1034184951790305330/1204124636938248304/20240204191800_1.jpg?ex=65d39775&is=65c12275&hm=0ed603e3ab7670176df1643580e5b4e85780760d3140c06e68ec257d289f081b&
18:02fdobridge_: <rinlovesyou> minecraft does reasonably well with zink & nvk, gets to around 100fps on my 2070 super
18:02fdobridge_: <rinlovesyou> minecraft does reasonably well with zink & nvk, gets to around 100fps when uncapped on my 2070 super (edited)
18:07fdobridge_: <huntercz122> with sodium?
18:07fdobridge_: <rinlovesyou> yes
18:07fdobridge_: <Sid> 🐸
18:07fdobridge_: <rinlovesyou> it's also somehow the first game i was able to attach renderdoc to without it crashing and burning
18:07fdobridge_: <rinlovesyou> https://cdn.discordapp.com/attachments/1034184951790305330/1204126169956360222/image.png?ex=65d398e2&is=65c123e2&hm=0898a3dde124904745e6acc0ea355bfc9ee866d21c57118c27bab691fdc5c6de&
18:07fdobridge_: <huntercz122> iirc i had around 300fps
18:07fdobridge_: <Sid> my 1660Ti laptop does 240-320 on the proprietary driver
18:07fdobridge_: <huntercz122> 8 chunks tho
18:08fdobridge_: <Sid> w/ sodium and starlight
18:08fdobridge_: <huntercz122> same
18:08fdobridge_: <rinlovesyou> yes on the proprietary i get into the upper 400s as well
18:08fdobridge_: <Sid> ah ok
18:08fdobridge_: <rinlovesyou> i'm talking nvk here
18:08fdobridge_: <Sid> yes we're aware 😅
18:08fdobridge_: <huntercz122> there's one with vulkanmod
18:08fdobridge_: <huntercz122> https://cdn.discordapp.com/attachments/1034184951790305330/1204126517572141116/297499897-dd5c8be0-03bf-4fb3-9c29-48788de07d82.png?ex=65d39935&is=65c12435&hm=3fe28844746d337281ab71b3d0c8fc9b1e397cc2770c6062360ad25e499fbd22&
18:08fdobridge_: <Sid> just that the 2070S is a fair bit more beefy than our 1660TiMs, so
18:09fdobridge_: <Sid> ..wait that's nvk?
18:09fdobridge_: <huntercz122> yes
18:09fdobridge_: <Sid> amazing
18:09fdobridge_: <rinlovesyou> now i need to see this vulkanmod
18:09fdobridge_: <Sid> https://modrinth.com/mod/vulkanmod
18:09fdobridge_: <Sid> no shader support btw
18:09fdobridge_: <rinlovesyou> gotta yeet sodium & iris first tho
18:10fdobridge_: <rinlovesyou> yeah
18:10fdobridge_: <huntercz122> it's also month old screenshot
18:10fdobridge_: <Sid> :o
18:10fdobridge_: <huntercz122> https://github.com/xCollateral/VulkanMod/issues/330#issuecomment-1896590432
18:12fdobridge_: <huntercz122> smh somewhat not closed
18:14fdobridge_: <rinlovesyou> i can't turn off vsync 😭
18:14fdobridge_: <rinlovesyou> keeps le crashing
18:15fdobridge_: <rinlovesyou> ok it worked in the menu
18:15fdobridge_: <rinlovesyou> definitely awesome, gets up to 300fps for me
20:12fdobridge_: <zmike.> RX5700/6800/7900, DG2, 2070
20:12fdobridge_: <zmike.> it's considered normal to get 200fps in opengl mode on any competent driver
21:31mindovermatter: 524+524+627+627+931+931 -24-230-93
21:31mindovermatter: 12+12+115+627+931+931 i can actually see, that it actually works, but if number is above three fourths 3/4 of the bound compiler has to adjust the 93 to 512 in this example, but there is a routine, but cheap queue jumping is more complex cause the former examples access would happen only in ascending order but there are two derived methods, queue jumping on ascending order and queue jumping on arbitrary fixed position storing
21:31mindovermatter: or reading. That it would be cheap in terms of storage not as contiguous indexes such as 512 1024 1536 etc. but cheaper, then compiler has to be clever and execute little smarter routine which should end up still quite thin, but little extra is added to the analysis, i already had success on those ones but this has more details and are long to explain on banners environment, but to mark the range or length of the compressed
21:31mindovermatter: data can be anything not only 1024 but 65336 weighted or bounded and above, so technically it does work. But i never looked if elias fano is similar. I think my methods are thinnest possible,, just hard to read others inventions.
21:33mindovermatter: so those numbers represent what happens when some number of 512 or 1924 is taken off, and the other column is the subtract of value and inverse that gets added
21:33mindovermatter: to get those values in such order
21:49Lyude: hm. Went to try implementing the ref/unref stuff for atomic states in rust, discovered basically all of the refcounting infrastructure is inlined
21:52Lyude: that's a bit painful :S
21:52Lyude: wonder if we can just uninline the get/put functions for various modesetting objects and call it a day
22:07lionelhard: Now you spam my phone with your famous victory over a finnish mindill monster who i do not want by any means anymore, not close to my dad businesses never in million years around me with her magpies, now if you think the assaults come in cause i have something special in computing , they do not come in that way, they come in cause mental insitution made a fraud in my personal case, evidence came in that they found a permute in my
22:07lionelhard: genes with the help of scientists in bioengineering , so they take my cells and send them to great britain where they get researched, similarly in labs in ukraine etc. they make vaccines this way and bunch of other stuff that cures diseases, they try to project and silence the fraud by knocking me out, no person no worries so to speak, it's not like some other mob is after me, like do not release anti gravitation vehiccles or we
22:07lionelhard: kill you, this girl who 2.5 years bothered me always sucked other dicks and pussies and turned off the head when i wanted to kiss her , and extorted money from me only and harassed my dads businesses in cambodia with their cranks and thieves who had no pennies , those brittish assaulters and bombers are just filty trash as mwk is , they are not nasa scientists so to speak, so do not spam my phone about the girl i do not wanr any
22:07lionelhard: of those people around me, i do not care what she does until they do not harass my dads business , hotels and so on, or else we are going very brutal against those people. Me i am going to die it was more than 100 assaults i faced and last ones were pretty successful finally , but not as successful that i died on spot, so that is pity for them. We gonna hit back.
23:30Lyude: (turns out the answer was: rust/helpers.c is intended exactly for the thing I needed)
23:43Lyude: btw airlied gonna send out the nova/rvkms announcement today if that's alright, did you want a chance to look at it first or should I just go ahead?
23:45Lyude: *just go ahead and send it