04:54OftenTimeConsuming: How do I find out the default pstate? It's not listed under /sys/kernel/debug/dri/0/pstate and 07 doesn't seem right as it's very unstable.
07:51fdobridge: <airlied> @karolherbst https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29284 might be good to have nvc0 report useful stuff
07:51fdobridge: <airlied> @zmike how do you feel about cleaning up the zink renderer string, not sure it needs a vulkan version in it
07:54fdobridge: <karolherbst🐧🦀> yeah, next week as I'm off this one :ferrisUpsideDown:
08:48fdobridge: <!DodoNVK (she) 🇱🇹> @airlied @dwlsalmeida What's the progress on Vulkan video for NVK? I've last heard of some types of frames working for H.264 decoding:nouveau:
14:58fdobridge: <zmike.> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21922
15:10fdobridge: <triang3l> calling the function twice, assuming that `VK_DRIVER_ID_` is in the beginning of the string in one place and that it can be anywhere in the other :blobcatnotlikethis:
18:07fdobridge: <airlied> @zmike might clean it up a bit, just want to see how gnome displays it in GUI
18:15fdobridge: <zmike.> you do you
18:15fdobridge: <zmike.> also you need to actually hit tab when you @ping someone on discord or else it doesn't work
18:15fdobridge: <airlied> Ah yeah discord ftw
18:18fdobridge: <pixelcluster> except when discord doesn't like you and pings random people anyway
18:21fdobridge: <zmike.> I've never had that happen
18:23fdobridge: <pixelcluster> i've had it happen too often 🫠
18:24fdobridge: <pixelcluster> seems like it'll do whatever you don't want it
18:24fdobridge: <zmike.> skill issue I guess
19:22fdobridge: <gfxstrand> @redsheep My nvk/cbuf0 branch is now way more believable. You still probably won't see a perf boost without `NVK_DEBUG=no_cbuf` but at least it isn't an unholy pile of hacks anymore.
19:22fdobridge: <gfxstrand> Next NVK task is to try and sort out bindless cbufs which is what we need to get real perf
19:34fdobridge: <gfxstrand> Turns out that big GPUs really don't like stalls. 😅
19:39fdobridge: <redsheep> I'll give it a try tonight, sounds like fun. So it sounds like you're expecting it to be more stable or predictable? Or faster again?
19:40fdobridge: <!DodoNVK (she) 🇱🇹> What do you mean by "big"? Big CUDA core count? Big die size?
19:40fdobridge: <redsheep> Do you still see slow CTS?
19:41fdobridge: <gfxstrand> fewer bugs, hopefully
19:41fdobridge: <gfxstrand> core count, mostly
19:41fdobridge: <redsheep> Older gpus can have a huge die without being all that wide by current standards
19:41fdobridge: <gfxstrand> The wider the GPU, the more stalling hurts.
19:51fdobridge: <redsheep> I wouldn't be terribly surprised if earlier on with NVK the entire performance difference I had over those on turing mobile was down to Ada's really amped up clockspeeds, which IIUC should mean getting past stalls faster
19:52fdobridge: <gfxstrand> yeah
20:40fdobridge: <redsheep> Something I didn't mention that was interesting from my last round of cb0 branch testing, before now I've never really heard coil whine while on NVK. I think my higher fps testing in the witness is the only time I recall hearing it even a little
20:40fdobridge: <redsheep> So it's getting the GPU to pull more current, which makes sense with getting more utilization
21:08fdobridge: <djdeath3483> cbuf stands for convolution buffer?
21:10fdobridge: <redsheep> I thought it was constant buffer
21:10fdobridge: <gfxstrand> Constant buffer. 😝
21:10oppastyle: so now about the routines of convert to small int. that's the most interesting, so you have 32bit number, the highest bit gets removed with skeleton operations as provided before, now one tailors a subtract for each power the corresponding state of the subtracter aka the operand 2 of the first operation subtract is from every power corresponding distance from exact power is removed from the
21:10oppastyle: latter. so 1024 removes 64+10 hence is 950, and we add higher bit to the intermediate skeleton value above and as said the subtracter at the bottom same ir value first gets powers distanced to approx as already said. after that op all distances from say 7th bit get added back, to kind of emulate bitshift, then the ir value with msb bit in gets removed from it to expose the distances of powers
21:10oppastyle: that were not in. the reverse of the formats max hence lands it's value at 8to64 bit field . and for simplicity first 8 bits isolated later is queried from register file with index, that I already explained as to how that worked, so it's the most complex approximately 10ops routine per value, which is approx grand total 5ops with real bitshifts or dma byte granularity access doing the
21:10oppastyle: shifting, I talked about the hardest way for purpose to demonstrate my research and brainpower, the other way conversion is as simple, but big hunks are yet to describe about packed grouped alu compilation and execution access structures, which is simple but takes too much time to repeat again, I forwarded those to pm many times already, so it's quite sane to get agreed on a binary
21:10oppastyle: instruction containers format, to push all fixes to performance, I figured all details out already and am coding a prototype this year which is my last among such trolls like you to fight with, you are very arrogant but inherently retarded folks and terror kind of abusive and nasty trash overall.
21:42fdobridge: <djdeath3483> Ah boring 🙂 Like Intel's push constant stuff?
21:44fdobridge: <gfxstrand> Yes, except ours is measured in megabytes. 😝
21:44fdobridge: <djdeath3483> noiiice
21:45fdobridge: <gfxstrand> I mean, yes and no. I'm trying to stop using it because it doesn't pipeline well
21:48fdobridge: <gfxstrand> The blob driver is almost entirely using bindless cbufs which I haven't wired up in NAK yet.