00:50gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591
00:52gfxstrand[d]: Everything's passing CTS. If folks want to do a bit of perf testing, go for it. I don't expect it'll make much difference one way or another but if it massively regresses something, I'd like to know.
00:53gfxstrand[d]: I'll start rebasing bindless UBO tomorrow.
01:07gfxstrand[d]: The scheduler bits are a bit scorched earth yet but hopefully it won't hurt perf too bad
01:07gfxstrand[d]: It should also be more-or-less ready for real latencies
01:21anarsoul: hey folks, can someone who's subscribed to dri-devel re-CC me on https://lists.freedesktop.org/archives/dri-devel/2024-May/455512.html ?
01:22anarsoul: it looks like Ben dropped me from CC list for this patch, and he doesn't reply on gitlab issue either
01:37airlied: anarsoul: done
01:38anarsoul[m]: Thanks!
12:39phomes_[d]: gfxstrand[d]: I have tested the MR with the 7 games I installed atm. Everything still works. I did not notice any major improvement or regression
12:50pavlo_kozlenko[d]: karolherbst[d]: I remember that the topic of "automatic reclocking" was discussed here. So that's how I think. Perhaps it is necessary to reclocking the videocard to maximum frequencies for all calls, and create a list in which specific calls will not be done
12:50karolherbst[d]: pavlo_kozlenko[d]: yeah.. though the issue is, that for Pascal it requires firmware we'll never get. So that's kinda a really tough topic for any beginner
12:51pavlo_kozlenko[d]: karolherbst[d]: I speak for all video cards
12:51pavlo_kozlenko[d]: nv40-nv120
12:52pavlo_kozlenko[d]: just suggested such a scheme automaticly reclocking
12:52pavlo_kozlenko[d]: Necessary to see how it is implemented in AMD or Intel
12:53karolherbst[d]: mostly in firmware
12:53karolherbst[d]: and even in nouveau where done it's mostly in our own written firmware
12:54pavlo_kozlenko[d]: can we do it at the kernel level?
12:54karolherbst[d]: not really
12:55karolherbst[d]: there are good reasons to do it on the GPU, because for e.g. memory reclocking you have to disable VRAM for a moment
12:55karolherbst[d]: and that needs access to functionality you don't really get on a kernel level
14:49nanokatze[d]: gfxstrand[d]: speaking of bindless ubos, is it feasible `ucld` thing for PhysicalStorageBuffer with NonWritable set?
14:49babblebones[d]: gfxstrand[d]: Ooh, I suppose this applies to realtime VR compute shaders submitted?
14:50nanokatze[d]: no
14:51nanokatze[d]: nanokatze[d]: that's one of the paths vkd3d-p uses to implement root cbv (the other being push descriptor) and it's nicer of the two in terms of cpu perf
14:52babblebones[d]: Just graphics queue scheduling?
14:53nanokatze[d]: it's scheduling of instructions within a shader
14:53nanokatze[d]: not higher priority dispatches
14:53babblebones[d]: I see, ty
15:16gfxstrand[d]: nanokatze[d]: Yes, I think so. I need to take a hard look at NonWriteable but I think we can at least use `ld.global.constant`
15:17nanokatze[d]: ok cool
15:19gfxstrand[d]: LDC is faster yet but comes with caching implications that I don't think SSBOs satisfy
15:20gfxstrand[d]: But `ld.global.constant` is pretty dang fast
15:20nanokatze[d]: gfxstrand[d]: well I guess it's unspecced but I think the expectation is that ssbo with NonWritable is equivalent to ubo
15:22nanokatze[d]: in at least that you need to establish proper ordering between previous stores and successive loads for things to work and such
15:25gfxstrand[d]: Yes, but you have to ensure ordering but you don't have to do a full flush.
15:25gfxstrand[d]: They're not the same
15:26gfxstrand[d]: They're close but not the same
15:26gfxstrand[d]: We can do driver work to make them the same but that will involve more UBO cache invalidation. It's unclear until we get closer on perf which of those will win.
18:19gfxstrand[d]: Uh oh... Why is The Witness running at 45 FPS?!?
18:21gfxstrand[d]: Something to look at later, I guess.
18:30HdkR: Witnessing some very cinematic framerates :)
18:33dadschoorse[d]: do the ptx video instructions map to hw instruction or are they just for convenience?
18:34redsheep[d]: gfxstrand[d]: With no cbuf or no?
18:35HdkR: dadschoorse[d]: Video instructions have been long dead. Since like Fermi/Tesla or something
18:35redsheep[d]: Speaking of, when I do get around to testing does that env variable make it exercise the new code more, or less?
18:36gfxstrand[d]: redsheep[d]: Either(ish). I'm not sure what's going on
18:37redsheep[d]: There's got to be some better way to tell what the hardware has going on
18:37redsheep[d]: Like actually see occupancy and why
18:37dadschoorse[d]: HdkR: ah, kind of a shame, they sound cool. but they were probably also not used much
18:38HdkR: dadschoorse[d]: It also was likely better to just decompose them to regular integer operations
18:38HdkR: Some of the random modifiers and things might have survived?
18:39redsheep[d]: gfxstrand[d]: Doesn't the Radeon profiler or similar have useful stuff for this? Is that remotely within the realm of possibility to wire up to work, or get something similar going?
18:41gfxstrand[d]: Not without figuring out perf queries in the kernel first
18:41dadschoorse[d]: HdkR: I mean, amd still support integer output clamp, packed int16, absdiff for all int sizes. until rdna2 it also had 16/8 extract modifiers
18:43dadschoorse[d]: but not sure how much sense that makes
18:43HdkR: Yea, I wouldn't be surprised if clamp modifier made it through, and a few other things :)
18:44redsheep[d]: gfxstrand[d]: Is this the same as the performance counters stuff you were talking about before? I assume somebody needs to do some RE work to figure out what's what?
18:44HdkR: I think all packed integer math was deleted at some point though?
18:49redsheep[d]: Suppose I could browse through the openrm code and see if anything is mentioned about this
18:54gfxstrand[d]: gfxstrand[d]: It's also 45 FPS on main with and without no_cbuf. I'm starting to think it might not be my branch.
19:03triang3l[d]: gfxstrand[d]: Is NonWritable, but not Restrict, and some other resource not being Restrict, something to actually care about, by the way?
19:04triang3l[d]: also coherence in a scope wider than the invocation might possibly be important
19:28redsheep[d]: I wonder how much umd magic is involved with the nsight perf sdk, it would probably be the best tool to try to set up. Also wonder if that works on openrm yet, I should test it.
19:29redsheep[d]: I assume it should be possible to trace it and add whatever it needs
19:52mhenning[d]: triang3l[d]: In general, yes we need to handle cases like ssbos aliasing (potentially with arbitrary offsets)
19:52mhenning[d]: This kind of stuff is specified in the vulkan memory model
22:07biker_rat: identify vf500f
22:07tiredchiku[d]: don't leak your passwords in chat!
22:08biker_rat: howto change paasword?
22:08HdkR: If it is a password strong enough to brute force in seconds, is it really a password at all?
22:09biker_rat: only risk is being banned if someone uses my name to cause insult
22:10biker_rat: anyway nvidia company is closing barn door in linux systems they left open with their driver today
22:12biker_rat: I am a non-developer type. Is their any chance their gigantic turing and forward firmware put same opening in nouveau?
22:13tiredchiku[d]: nouveau on kernels newer than 6.7 already supports GSP, if that's what you mean
22:13tiredchiku[d]: it's the basis of the vulkan driver, NVK, being written in Mesa
22:14biker_rat: NVIDIA announce bomb sized security hole in non-latest version of drivers today.
22:18biker_rat: I don't know if they like mesa enough to speak to whether mesa is affected. So a mesa developer with enough experience to understand their security bulletin might want to read it. I am not educated enough in this area to do so.
22:19clangcat[d]: biker_rat: was it a bug in the NVIDIA prop drivers?
22:20biker_rat: https://nvidia.custhelp.com/app/answers/detail/a_id/5551
22:20clangcat[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1248763622784765992/grafik.png?ex=6664d940&is=666387c0&hm=d117865f34f554f94d3e36aa3c81eea76c15ea3180e085feee9949b507e84ef9&
22:20clangcat[d]: Ahhh I am guessing you mean these mesa is likely not affect due to them not using Nvidia code. Not unless the bug was in the firmware it self#+
22:21clangcat[d]: Yea no those are just in the Nvidia prop drivers. Obviously Nouveau can have bugs also but as far as I know they do not share code and likely wouldn't be affected.
22:22biker_rat: I have no idea what is the problem from reading it. But that it is bad problem, that I can understand from reading it.
22:28biker_rat: I am worried they wouldn't even check if your code was vulnerable if root cause is in firmware. I worked in industrial plant engineering environment run by bad corporate culture for twenty years. Those guys wouldn't spend a second of their time looking at other company impact if they don't fear lawsuit.
23:02airlied[d]: If the fw is the root we are kinda screwed at the moment since we have no nice way to upgrade to a newest fw
23:32friendlyinvader[d]: clangcat[d]: 0090 and 0091 are almost certain related
23:32clangcat[d]: friendlyinvader[d]: Yea very likely but not something that will effect mesa
23:33clangcat[d]: airlied[d]: Yea if the firmware is the problem I haven't read it to much but seems like mesa would be fine
23:33friendlyinvader[d]: Yeah, this is something in the user space driver that's allowing people to poke shit in their device driver
23:34friendlyinvader[d]: Which they're worried could lead to privilege escalation and all the fun things that brings :weirdCate:
23:36clangcat[d]: friendlyinvader[d]: Yea that's not great. Mainly just the you don't want anyone with kernel level
23:38friendlyinvader[d]: clangcat[d]: I'm sure nobody could do anything wrong with that level of privilege :p