06:04fdobridge: <Esdras Tarsis> Ben updated his patches, now my bug with disp is fixed and there are no more magic numbers in the switch case :)
06:04fdobridge: <Esdras Tarsis> now we need gsp-5303002.bin instead of gsp-5258902.bin I think
13:47fdobridge: <karolherbst🐧🦀> that would be quite cool actually, because the new gsp apparently has more stuff implemented
14:09fdobridge: <gfxstrand> @karolherbst🐧 Does NV have any instructions that can take or return a vec3 other than tex ops?
14:09fdobridge: <gfxstrand> I know the global read/write can't.
14:10fdobridge: <gfxstrand> All the fp64 stuff is vec2
14:10fdobridge: <gfxstrand> Really, any non-power-of-two is a potential problem
14:12fdobridge: <karolherbst🐧🦀> mhh, i'm sure there are vec3 loads.. let me check
14:14fdobridge: <karolherbst🐧🦀> @gfxstrand inter stage data load/stores can do vec3
14:14fdobridge: <karolherbst🐧🦀> `AL2P`, `ALD` and `AST`
14:19fdobridge: <gfxstrand> Ok. A bit silly but okay.
14:20fdobridge: <karolherbst🐧🦀> yeah.. but
14:20fdobridge: <karolherbst🐧🦀> you can treat it as vec4
14:21fdobridge: <karolherbst🐧🦀> it's still vec4 aligned anyway, just have the 4th component free for other stuff
14:22fdobridge: <gfxstrand> Yeah
14:22fdobridge: <gfxstrand> That's the question: Do we actually care about vec3 or can I just round up to vec4 in RA?
14:22fdobridge: <karolherbst🐧🦀> I'm sure there is this corner case where it might safe us gprs
14:22fdobridge:<karolherbst🐧🦀> but
14:23fdobridge: <karolherbst🐧🦀> gprs are allocated in blocks of 4 anyway
14:23fdobridge: <karolherbst🐧🦀> so the chances it actually makes a difference is super low
14:23fdobridge:<karolherbst🐧🦀> but
14:23fdobridge: <karolherbst🐧🦀> we might want to detect this case for shader stats
14:23fdobridge: <karolherbst🐧🦀> I just don't think it's a 0.1%+ performance thing
14:25fdobridge: <gfxstrand> If it's only texture sources and inter-stage loads, I don't think it'll ever matter.
14:25fdobridge: <gfxstrand> Can you have a vec3 texture destination?
14:29fdobridge: <karolherbst🐧🦀> yes, so for scalar texture ops it actually matters
14:29fdobridge: <karolherbst🐧🦀> because then you have a vec2 + scalar dest
14:29fdobridge: <karolherbst🐧🦀> we just don't do scalar, so it's all vec4
14:29fdobridge: <karolherbst🐧🦀> for tex it will actually matter
14:31fdobridge: <karolherbst🐧🦀> or well.. matter more, because vec2 vs scalar is a bigger deal
14:31fdobridge: <karolherbst🐧🦀> all those scalar texture ops are there to give RA a lot of freedom about placing values
14:50fdobridge: <gfxstrand> what do you mean vec2 + scalar dest?
14:50fdobridge: <gfxstrand> There's two destinations for tex ops?
14:51fdobridge: <karolherbst🐧🦀> well.. yes so
14:51fdobridge: <karolherbst🐧🦀> tex operations have this .SCR flag on turing
14:51fdobridge: <karolherbst🐧🦀> (also on maxwell, but it's a different thing)
14:51fdobridge: <karolherbst🐧🦀> and that makes the vec4 dest to be a pair of vec2
14:51fdobridge: <karolherbst🐧🦀> but if you only need one destination, it's a scalar dest
14:52fdobridge: <karolherbst🐧🦀> if you need two, you have a vec2
14:52fdobridge: <karolherbst🐧🦀> if you need three, it's vec2 + scalar
14:53fdobridge: <karolherbst🐧🦀> maybe that's always there with turing....
14:53fdobridge: <karolherbst🐧🦀> but for the sources you have something similar going on
14:53fdobridge: <karolherbst🐧🦀> ahh yeah.. on turing+ only the sources are affected by .SCR
14:55fdobridge: <karolherbst🐧🦀> for maxwell/pascal you have this beauty: https://gitlab.freedesktop.org/mesa/mesa/-/commit/f821e80213e38e93f96255b3deacb737a600ed40
14:56fdobridge: <karolherbst🐧🦀> how this works on turing+ is, that you have a list of sources and balance them between the two (up to vec4) sources
14:57fdobridge: <karolherbst🐧🦀> .SCR only works if you have 4 or fewer sources
14:59fdobridge: <karolherbst🐧🦀> 1: Src0 == Ra.x
14:59fdobridge: <karolherbst🐧🦀> 2: Src0 == Ra.x, Src1 == Rb.x
14:59fdobridge: <karolherbst🐧🦀> 3: Src0 == Ra.x, Src1 == Ra.y, Src2 == Rb.x
14:59fdobridge: <karolherbst🐧🦀> 4: Src0 == Ra.x, Src1 == Ra.y, Src2 == Rb.x, Src3 == Rb.y
14:59fdobridge: <karolherbst🐧🦀> and they are either scalar (no alignment constraints), or vec2
15:00fdobridge: <karolherbst🐧🦀> hope that's understandable enough?
15:05fdobridge: <karolherbst🐧🦀> and because it makes sense, the balancing works differently for the destination
15:06fdobridge: <karolherbst🐧🦀> so 1st and 2nd channel always goes into the first reg
15:06fdobridge: <karolherbst🐧🦀> 3rd and 4th always in the second
15:06fdobridge: <karolherbst🐧🦀> pre volta actually, you only got a vec4 dest register
15:07fdobridge: <karolherbst🐧🦀> well
15:07fdobridge: <karolherbst🐧🦀> except for those fancy scalar ones in maxwell+pascal
15:08fdobridge: <karolherbst🐧🦀> also the reason why it's so restricted there, because you actually need 16 more bits to encode the 2nd dest and src reg
15:08fdobridge: <karolherbst🐧🦀> and you only got 64 in total 🙂
15:15fdobridge: <karolherbst🐧🦀> @airlied seems like with GSP I can run the mme tests, which is good enough 🙂
15:15fdobridge: <karolherbst🐧🦀> but I do have some hard display regressions with the newest kernel 😄
15:38fdobridge: <gfxstrand> I thought there were tex ops with more than 4 args
15:38fdobridge: <karolherbst🐧🦀> yes, for those you can't use the `.SCR` flag
15:39fdobridge: <gfxstrand> Okay... How do those work then?
15:39fdobridge: <gfxstrand> vec4 + remainder?
15:39fdobridge: <karolherbst🐧🦀> like they do in codegen atm
15:39fdobridge: <karolherbst🐧🦀> in non SCR form you have two vec4 sources
15:39fdobridge: <karolherbst🐧🦀> and just fill them from the start
15:42fdobridge: <karolherbst🐧🦀> 1st source: LocClamp[27:16] | array[15:0], s, t, r
15:42fdobridge: <karolherbst🐧🦀> 2nd source: sampler[31:20] | header[19:0], lod, off[11:0], DC
15:42fdobridge: <karolherbst🐧🦀> in `.SCR` form you ignore the association between source and type and just have a unified list you balance across the 2 vec2 sources according to the above rules
15:44fdobridge: <karolherbst🐧🦀> yeah.. so in the normal mode each source type has its fixed place in the vec4 pair
15:47bylaws: airlied wdym by it's not a register?
15:48bylaws: i mean the counter itself, I assume it isn't reset upon counter reports?
15:48bylaws: since it has to reset to 0 at some point
16:02fdobridge: <gfxstrand> I think he means that there isn't a state register that is the offset. There's a register to query the offest.
16:02fdobridge: <gfxstrand> I think he means that there isn't a state register that is the offset. There's a register to query the offset. (edited)
16:05fdobridge: <gfxstrand> When is it reset? Probably when the buffer address or size is changed but we don't have docs.
16:06bylaws: Right that seems reasonable, thanks
16:34fdobridge: <karolherbst🐧🦀> okay.. progress
16:34fdobridge: <karolherbst🐧🦀> on GSP that macro loops forever
16:34fdobridge: <karolherbst🐧🦀> or maybe I just used it incorrectly
16:35fdobridge: <karolherbst🐧🦀> ehh nvm.. wrong increment mode
16:36fdobridge: <karolherbst🐧🦀> IT WORKS!!!
16:36fdobridge: <karolherbst🐧🦀> @gfxstrand that macro works :3
16:36fdobridge: <karolherbst🐧🦀> the test: https://gist.githubusercontent.com/karolherbst/98217df7df9ce7a9bb3d0e112222fb09/raw/0cbacb169466763f2a04c7fa805a067e89a08501/test.shader_test
16:37fdobridge: <karolherbst🐧🦀> without the macro: 1 vs 15
16:37fdobridge: <karolherbst🐧🦀> with the macro: 15 vs 15 :3
16:37fdobridge: <karolherbst🐧🦀> okay....
16:37fdobridge: <karolherbst🐧🦀> fun fact.. I didn't provide any buffer yet 😄
16:37fdobridge: <karolherbst🐧🦀> I hope it's now in allow everything mode
16:38fdobridge: <karolherbst🐧🦀> let me try to mess with random regs
16:39fdobridge: <karolherbst🐧🦀> heh...
16:39fdobridge: <karolherbst🐧🦀> it doesn't let me 😄
16:39fdobridge: <karolherbst🐧🦀> now I'm curious about the error
16:39fdobridge: <karolherbst🐧🦀> didn't get one..
16:44fdobridge: <gfxstrand> @karolherbst🐧 What macro? You got a GSP FALCON macro to work?
16:44fdobridge: <karolherbst🐧🦀> yeah
16:44fdobridge: <karolherbst🐧🦀> I was able to enable helper invocation memory loads through that `mme 33` macro :3
16:44fdobridge: <karolherbst🐧🦀> entirely from userspace
16:44fdobridge: <karolherbst🐧🦀> so that one uses `NVC597_SET_FALCON04` for it's stuff
16:45fdobridge: <karolherbst🐧🦀> no idea what `NVC597_SET_FALCON09` is all about honestly
16:45fdobridge: <karolherbst🐧🦀> next step would be to use the mme builder
16:45fdobridge: <karolherbst🐧🦀> seems like if I try to write into reg `0x200` GSP reaps the channel
16:45fdobridge: <karolherbst🐧🦀> I'll try to figure out what else I can do with it 😄
16:46fdobridge: <karolherbst🐧🦀> just to make sure it doesn't open the door to weird security problems
16:46fdobridge: <karolherbst🐧🦀> anyway.. I'm sure it doesn't work with the old school firmware now
16:46fdobridge: <karolherbst🐧🦀> and userspace will need a "runs on GSP" flag entirely for that
17:06fdobridge: <gfxstrand> If we really wanted to make it transparent to userspace, we could have the kernel run a pushbuf on context create but I'm also happy to have Mesa do it as long as we know all this stuff before we actually merge to main so we don't have any actual backwards compatibility problems.
17:18fdobridge: <karolherbst🐧🦀> yeah.. so the idea would be that the device info ioctl gets a new field: `runs_gsp` or something, and depending on that mesa is doing it's thing or not
17:18fdobridge: <karolherbst🐧🦀> and for the non gsp case, we just whack that reg in the kernel
17:18fdobridge: <karolherbst🐧🦀> it's unfortunate, but I don't want to figure out how to do this the same way with the nvidia provided old school firmware
17:18fdobridge: <karolherbst🐧🦀> if it's like disabled and all
17:39fdobridge: <gfxstrand> Yeah
17:39fdobridge: <gfxstrand> What happens if you run a FALCON4 macro with the old firmware? No-op? Context loss? Something in-between?
17:41fdobridge: <Mohamexiety> ```
17:41fdobridge: <Mohamexiety> [ 3.861135] nouveau 0000:01:00.0: gsp: firmware "nvidia/ga102/gsp/gsp-5258902.bin" loaded - 37324728 byte(s)
17:41fdobridge: <Mohamexiety> [ 3.960264] nouveau 0000:01:00.0: gsp: firmware "nvidia/ga102/gsp/bootloader-5258902.bin" loaded - 12388 byte(s)
17:41fdobridge: <Mohamexiety> [ 4.171455] nouveau 0000:01:00.0: gsp: firmware "nvidia/ga102/gsp/booter_load-5258902.bin" loaded - 37752 byte(s)
17:41fdobridge: <Mohamexiety> ```
17:41fdobridge: <Mohamexiety> gsp works on ga102 \o/
17:47fdobridge: <karolherbst🐧🦀> nothing
17:47fdobridge: <karolherbst🐧🦀> or at least nothing I could observe
17:47fdobridge: <karolherbst🐧🦀> I can try again, now that I know that my stuff works
17:47fdobridge: <karolherbst🐧🦀> but I think with the non gsp stuff it just didn't make any difference
17:47fdobridge: <karolherbst🐧🦀> might also require me to do more on the kernel side
17:47fdobridge: <karolherbst🐧🦀> anyway, having the userspace side figured out helps here 🙂
17:59fdobridge: <airlied> For the nvidia supplied old fw we have to setup the table. GSP must keep the register list in the fw.
18:00fdobridge: <airlied> Also not sure if we should have a big gap in use switch or make it more finegrained
18:01fdobridge: <airlied> I suppose we could report gap fw version in use at least
18:02fdobridge: <airlied> Maybe stick a gsp in driver strings
18:09fdobridge: <karolherbst🐧🦀> there is an "allow all" config option
18:09fdobridge: <karolherbst🐧🦀> the buffer is an optional thing
18:09fdobridge: <karolherbst🐧🦀> I don't even provide a buffer for gsp at all either
18:10fdobridge: <airlied> I think the gsp fw provides it
18:10fdobridge: <karolherbst🐧🦀> potentially
18:10fdobridge: <airlied> but maybe it doesn't, seems wierd that nvidia would default to an insecure mode
18:10fdobridge: <karolherbst🐧🦀> anyway.. knowing that the userspace side works helps a lot figuring out the kernel side
18:10fdobridge: <karolherbst🐧🦀> it's not fun having to figure out both at once 😄
18:10fdobridge: <karolherbst🐧🦀> yeah..
18:10fdobridge: <karolherbst🐧🦀> sooo
18:10fdobridge: <karolherbst🐧🦀> allow all is `0` even
18:11fdobridge: <airlied> like openrm never sets up a list
18:11fdobridge: <karolherbst🐧🦀> https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/volta/gv100/dev_ctxsw.ref.txt#L105
18:12fdobridge: <karolherbst🐧🦀> nouveau currently sets it to: `https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/volta/gv100/dev_ctxsw.ref.txt#L105` 🙃
18:12fdobridge: <karolherbst🐧🦀> nouveau currently sets it to: `#define NV_CTXSW_MAIN_IMAGE_PRIV_ACCESS_MAP_CONFIG_MODE_ALLOW_ALL 0x00000000` 🙃 (edited)
18:13fdobridge: <airlied> if openrm doesn't do, then the gsp fw has to do it, there isn't anywhere else to hide it
18:13fdobridge: <karolherbst🐧🦀> it's part of the ctx header
18:13fdobridge: <airlied> which I assume is under the fw control not the open bits
18:13fdobridge: <karolherbst🐧🦀> yeah, I don't know
18:13fdobridge: <airlied> otherwise they'd clearly be setting it in openrm
18:13fdobridge: <airlied> you can't set it in userspace
18:13fdobridge: <karolherbst🐧🦀> I have no idea how the contexts are created under gsp
18:14fdobridge: <karolherbst🐧🦀> I meant the header kernelspace creates
18:14fdobridge: <airlied> kernelspace doesn't create it for gsp
18:14fdobridge: <karolherbst🐧🦀> it's part of the golden context stuff?
18:14fdobridge: <airlied> yes that seems likely
18:15fdobridge: <karolherbst🐧🦀> maybe there are some toggles or something?
18:15fdobridge: <karolherbst🐧🦀> anyway.. will check that out later this week
18:15fdobridge: <airlied> not sure what there would be to toggle here, it makes little sense for nvidia to allow creating insecure gsp contexts
18:15fdobridge: <karolherbst🐧🦀> right
18:15fdobridge: <karolherbst🐧🦀> but they had the option with their current driver already
18:16fdobridge: <karolherbst🐧🦀> and it's still an open question if it's indeed disabled in the old school firmware, which I'm fairly sure should be disabled
18:16fdobridge: <karolherbst🐧🦀> otherwise we have to file a CVE 🙃
18:16fdobridge: <airlied> I think nvgpu is using the same old school fw so we should be able to follow that
18:16fdobridge: <karolherbst🐧🦀> no
18:16fdobridge: <karolherbst🐧🦀> it doesn't
18:17fdobridge: <karolherbst🐧🦀> nouveau handed out custom made firmware for nouveau
18:17fdobridge: <airlied> oh you mean old school nouveau context fw
18:17fdobridge: <karolherbst🐧🦀> Nvidia handed out custom made firmware for nouveau (edited)
18:17fdobridge: <airlied> I thought you meant old school nviida
18:17fdobridge: <karolherbst🐧🦀> it's not the same as what Nvidia's driver is using
18:17fdobridge: <karolherbst🐧🦀> yeah
18:17fdobridge: <airlied> oh skeggsb is pretty sure the fecs/gr fw was the same
18:17fdobridge: <airlied> the custom stuff was more for the secure boot bits etc
18:17fdobridge: <karolherbst🐧🦀> mhhh
18:17fdobridge: <karolherbst🐧🦀> well
18:17fdobridge: <karolherbst🐧🦀> this is security related tho
18:18fdobridge: <karolherbst🐧🦀> I'll dig deeper, but at least from what I can tell.. this part doesn't do anything
18:18fdobridge: <karolherbst🐧🦀> I even provided a random garbage buffer address
18:18fdobridge: <karolherbst🐧🦀> nothing
18:18fdobridge: <karolherbst🐧🦀> though it might fault, but nope
18:20fdobridge: <karolherbst🐧🦀> thought it might fault, but nope (edited)
18:27fdobridge: <karolherbst🐧🦀> @airlied anyway.. nouveau with or without gsp on Ben's branch regresses display support 😦
18:35fdobridge: <Esdras Tarsis> Nice!
18:35fdobridge: <Esdras Tarsis> now update your tree and use the 530 version 🐸
19:38fdobridge: <J., Echo (she) 🇱🇹> When will NVK run Crysis?
19:45fdobridge: <Mohamexiety> display seems to work using GSP too.
19:45fdobridge: <Mohamexiety> ```
19:45fdobridge: <Mohamexiety> Extended renderer info (GLX_MESA_query_renderer):
19:45fdobridge: <Mohamexiety> Vendor: Mesa (0x10de)
19:45fdobridge: <Mohamexiety> Device: NV172 (0x2216)
19:45fdobridge: <Mohamexiety> Version: 23.0.0
19:46fdobridge: <Mohamexiety> Accelerated: yes
19:46fdobridge: <Mohamexiety> Video memory: 10219MB
19:46fdobridge: <Mohamexiety> Unified memory: no
19:46fdobridge: <Mohamexiety> Preferred profile: core (0x1)
19:46fdobridge: <Mohamexiety> Max core profile version: 4.3
19:46fdobridge: <Mohamexiety> Max compat profile version: 4.3
19:46fdobridge: <Mohamexiety> Max GLES1 profile version: 1.1
19:46fdobridge: <Mohamexiety> Max GLES[23] profile version: 3.2
19:46fdobridge: <Mohamexiety> ```
19:46fdobridge: <Mohamexiety> had to use KDE wayland tho. KDE x11 worked for like 5 seconds and then the whole window manager just died
19:46fdobridge: <Mohamexiety> (this is on GA102)
19:46fdobridge: <Mohamexiety> also using the latest version of ben's repo. so kernel 6.3 and firmware 530.30.02
19:50fdobridge: <J., Echo (she) 🇱🇹> So Wayland better 😈
20:14fdobridge: <Esdras Tarsis> In this case, yes, I proved with glmark2
21:00fdobridge: <gfxstrand> Cool. In that case, we can just run the FALCON macro on context create whether or not GSP is enabled and just trust kernel setup for the non-GSP case.
21:01fdobridge: <karolherbst🐧🦀> ... _maybe_
22:29skeggsb: karolherbst: i'm 95% certain we use identical gr fw to nvidia (ours is probably a few revisions behind now, but that shouldn't matter)
22:29skeggsb: i've extracted and diffed them before
22:29skeggsb: probably just setup wrong
22:30skeggsb: we use a legacy context header still too
22:30skeggsb: mostly because i didn't want to deal with splitting it for our fw vs nv's
23:25fdobridge: <J., Echo (she) 🇱🇹> Why could nouveau runpm (nouveau.runpm=1) freeze my system? 🐸
23:47fdobridge: <J., Echo (she) 🇱🇹> Actually it doesn't freeze, but I get this
23:47fdobridge: <J., Echo (she) 🇱🇹> `kernel: nouveau 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible`
23:48fdobridge: <J., Echo (she) 🇱🇹> The NVIDIA blob driver says that runtime D3 isn't supported
23:56fdobridge: <karolherbst🐧🦀> well.. it's broken
23:57karolherbst: skeggsb: yeah.. no idea.. thing is, it didn't work with the one we've gotten. But let me take another look tomorrow and see if I can make it work
23:59fdobridge: <J., Echo (she) 🇱🇹> `kernel: nouveau 0000:01:00.0: gr: DATA_ERROR 0000009c [] ch 2 [00ffe46000 supertuxkart[4531]] subc 0 class c597 mthd 0d78 data 00000004`
23:59fdobridge: <J., Echo (she) 🇱🇹>
23:59fdobridge: <J., Echo (she) 🇱🇹> Ooops 🤔
23:59fdobridge: <J., Echo (she) 🇱🇹> glxgears works though