00:16imirkin: anholt: karolherbst: jfyi, nv50 only supports images/ssbo on compute. shared memory indirect address must use an addr reg, while global memory indirect address must use gpr reg.
00:16karolherbst: imirkin: yeah.. but you did some changes to from_tgsi which I think from_nir wasn't updated with
00:16imirkin: entirely possible.
00:16karolherbst: and I am not sure if I got the FILE_ADDRESS thing all correct either :)
00:17imirkin: ohhh yeah. that buffer remapping thing. definitely needed in nir ;)
00:18imirkin: yeah, both of those changes need to get taken into account, at least
00:18karolherbst: yeah, I'll just need to think about on how to port it over
00:19karolherbst: images go into a gmem slot?
00:19karolherbst: now that's annoying
00:21karolherbst: imirkin: although I guess in nir we could just rewrite the nir_variables driver_locations in one pass iterating over images + buffers
00:21karolherbst: and be done with it
00:22karolherbst: or is there more to it?
00:23karolherbst: well.. filling nv50_gmem_state, but that would be part of that lowering
00:23karolherbst: yeah.. should be quite easy
00:29imirkin: karolherbst: yeah, it's not hard
00:29imirkin: just have to remap 2 spaces into 1
00:30imirkin: and there's a map that tells you how to do it
00:30imirkin: but very important to do, if you don't, things go sour ;)
00:30karolherbst: mhh? I thought the map is the output of codegen?
00:31imirkin: oh. it might be?
00:31imirkin: sorry, been a little while since i've done it
00:31karolherbst: or you mean bufferIds and imageIds, but that's TGSI internal stuff
00:31karolherbst: my idea was to just fixup the nir itself
00:31karolherbst: and make ssbos and iamges share the same index
00:31imirkin: anyways, it has to be a single space.
00:31imirkin: it's all g internally
00:32imirkin: and when configuring the stuff in the pushbuf, the addresses have ot line up
00:32karolherbst: yeah.. so if image0 and image1 have driver_location 0, 1 and ssbo0,1 have the same, I can just loop over both with an i++ or something.. I just have to check how GL images and ssbos look like :D
00:32imirkin: something like that.
10:31karolherbst: my motherboard is weird.. it says something about non UEFI compatible GPU and the it still displays the firmware stuff..
10:49karolherbst: uhh.. the CTS be like "SPV_ENV_UNIVERSAL_1_6"
12:16karolherbst: anholt: okay.. I got my gt215 up an running and will fix the ssbo stuff with nir :)
12:51karolherbst: [ 5172.552283] pcieport 0000:00:01.0: AER: Corrected error received: 0000:00:01.0
12:51karolherbst: [ 5172.559726] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
12:51karolherbst: [ 5172.570014] pcieport 0000:00:01.0: device [8086:460d] error status/mask=00001000/00002000
12:51karolherbst: [ 5172.578691] pcieport 0000:00:01.0:  Timeout
12:51karolherbst: gnome does run though
13:47karolherbst: imirkin: Passed: 1/1 (100.0%) :)
13:47karolherbst: not sure if it's correct if ssbos and images are mixed, but I think it is :D
13:48karolherbst: Passed: 606/747 (81.1%)
13:48karolherbst: Failed: 4/747 (0.5%)
13:48karolherbst: Not supported: 137/747 (18.3%)
13:48karolherbst: for "dEQP-GLES31.functional.image_load_store.*"
13:48karolherbst: something is still wrong with ssbos.. mhh
13:52karolherbst: fun.. the one shader I am looking at is even equal
13:52karolherbst: just two BBs have different ids
13:53karolherbst: maybe I forget to export something...
13:56karolherbst: ehhh but that's weird no? because those tests use ssbos to export stuff
14:01karolherbst: ahh.. memory corruption :)
14:01karolherbst: ehh wait, that wasn't it
14:36karolherbst: ohh, I think I found it
14:42karolherbst: ssbos use explicit_binding :)
14:44karolherbst: okay, but now some image_load_store things fail
14:45karolherbst: ehh wait
14:46karolherbst: it's binding :)
14:50karolherbst: image_load_store: Failed: 4/747 (0.5%)
14:51karolherbst: ssbo: Failed: 294/2061 (14.3%)
14:53karolherbst: uhh.. right
14:59karolherbst: next try :)
15:09karolherbst: anholt: there you go: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16142
17:10anholt: nouveau nir seems to be using GL for group barriers, while nir is doing TGSI. what's the right answer?
17:11anholt: let me try saying that again.
17:11anholt: ... while TGSI is doing CTA
17:21anholt: what does SUBOP_SHIFT_WRAP do?
17:27karolherbst: anholt: SUBOP_SHIFT_WRAP so bits shifted out comes back on the other end
17:27karolherbst: or was it the other way around?
17:27anholt: it looks like you're setting it on shl/shr
17:27karolherbst: I always forget
17:27anholt: and tgsi isn't
17:27karolherbst: yeah, which is the correct thing
17:27anholt: and I'm amazed anything works
17:28karolherbst: TGSI and nir don't match there
17:28anholt: uh what?
17:28karolherbst: optimizations something
17:28karolherbst: there was a reason I put it there
17:28anholt: tgsi and nir have matching semantics.
17:28karolherbst: anholt: I think nir has stronger semantics, because of opts
17:28anholt: oh, is this about wrapping on the shift count?
17:28karolherbst: and TGSI never cared
17:28karolherbst: anholt: yes
17:29karolherbst: ahh right
17:29anholt: so not bits shifted out coming back on the other end, but the "& 0x1f" on the shift count
17:29karolherbst: my explenation was crappy :)
17:29karolherbst: anholt: right.. that was it
17:29anholt: except, again, tgsi and nir have the same semantics for that.
17:30karolherbst: I think it matters for opts
17:30karolherbst: so nir optimizes aggressively and depends on wrapping behavior
17:30anholt: so you're saying that tgsi backend doesn't correctly implement the tgsi semantics. ok.
17:30karolherbst: anyway... it's required
17:31karolherbst: I am saying that how it is today is the correct thing :D
17:31karolherbst: some tests fail without it
17:32karolherbst: anholt: it could also be that codegen passes were the culprit
17:34karolherbst: anholt: so TGSI docs specify "The shift count is masked with 0x1f before the shift is applied."
17:34anholt: and nir is the same.
17:35karolherbst: mhh, seems to be indeed the case, maybe it changed
17:37karolherbst: anholt: yeah soo.. TGSI would have to use _WRAP as well
17:37karolherbst: but maybe that's fine because of ... something
17:37karolherbst: I think when I looked into it, it was because of optimizations and nir making heavily use of its definition, where TGSI doesn't
17:38karolherbst: dunno.. it's a long time ago
18:53anholt: well, I've got no clue on the new fails like in_invocation.image_alias_overwrite and compute.basic.ubo_to_ssbo_single_invocation
19:01karolherbst: anholt: if you have a nice list of that, I can take a look
19:07anholt: last commit of the MR
20:03karolherbst: mhh... something is broken with inputs
20:05karolherbst: ehh nope
20:05karolherbst: we don't set local mem limits correctly
20:06karolherbst: but why..
20:07karolherbst: uhh ohhh...nooo
20:07karolherbst: gr: TRAP_MP - TP0: 00000010 [LOCAL_LIMIT_WRITE]
20:07karolherbst: gr: 00200000  ch 5 [001f74c000 Xorg] subc 3 class 8597 mthd 15e0 data 00000000
20:08karolherbst: imirkin: I'd assume that means we write too much?
20:08karolherbst: but the last write is 36: st u64 # l[0x40] $r0d (8)
20:08karolherbst: and tlsSize is 72
20:17karolherbst: mhhhh "shl u32 %a110 %r167 0x00000000"
20:18karolherbst: not saying that this is a problem, but.. :D
20:20karolherbst: ahh.. wow
20:20karolherbst: that's an annoying fail
20:22karolherbst: anholt: mind giving this patch a try? https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/50741e7cbf9ba4a3396170d1485ed32f8d8ac20a
20:23karolherbst: I think that's fundamental enough to fix random regressions, so makes sense to rerun the tests on your g92
20:29karolherbst: okay.. what about "dEQP-GLES2.functional.rasterization.limits.points"
20:29karolherbst: ehh fails with TGSI as well
20:32CodeAgain: Hello there! Just wanted to confirm I understood what the nouveau online docs mean by SLI support: is it the support to alternate between discrete GPU and nvidia GPU?
20:35karolherbst: CodeAgain: SLI support just means whatever official SLI nvidia provides. SLI is usually to balance work between multiple discrete nvidia GPUs
20:35karolherbst: gr: TRAP_MP - TP0: 00000100 [GLOBAL_LIMIT_READ]
20:35karolherbst: yeah well.. duh
20:36karolherbst: but what is wrong here..
20:37CodeAgain: Ohhh, I see, so then I do not have the setup mentioned in the page LOL
21:17karolherbst: anholt: soo.. I think I resolved most regressions now :)
21:18karolherbst: even that sync stuff passes
22:12anholt: karolherbst: which testcase was your fix supposed to fix?
22:12anholt: the tlsspace one
22:28karolherbst: quite a lot ones
22:29karolherbst: anholt: it fixes like those sync fails, and the dEQP-GLES31.functional.compute.basic ones
22:29karolherbst: but some of the above fails as well
22:29anholt: sync fails are from the ssbo vs image fix.
22:29karolherbst: anyway.. I also left a comment on your ssbo/image patch
22:29karolherbst: ahh.. could be
22:29karolherbst: anyway.. both those changes should fix most of the regression
22:29anholt: and the tlsspace fix doesn't change the remaining compute.basic fails
22:30karolherbst: huh... wait
22:30karolherbst: it does fix KHR-GL33.shaders.arrays.declaration.dynamic_expression_array_access_fragment and KHR-GL33.shaders.arrays.declaration.dynamic_expression_array_access_vertex
22:30karolherbst: and some othrs
22:30karolherbst: and "dEQP-GLES31.functional.compute.basic.copy_image_to_ssbo_large" passes here as well now
22:31anholt: ok, so the KHR-GL33 regressions. great!
22:31anholt: compute.basic was also fixed by the ssbo/image fixup
22:31karolherbst: anyway.. I'd run the stuff with my two fixes and see what's left
22:31anholt: there are 4 fails left in that group
22:32karolherbst: which group?
22:34karolherbst: ahh, okay
22:34anholt: MR updated
22:36anholt: cool, a few piglits are cleaned up as well
22:37karolherbst: anholt: ahh yeah...
22:37karolherbst: that's the binding part
22:37karolherbst: we can't start at 0
22:38karolherbst: decl_var ssbo INTERP_MODE_NONE Output sb_out (0, 0, 1)
22:38karolherbst: so the first ssbo starts at binding 1 here
22:38anholt: I'm not following: "we can't start at 0"?
22:39karolherbst: the first buffer is at binding 1, not 0
22:39karolherbst: but you bind it at 0
22:40karolherbst: I did iterate over all the args, so I read out var->data.binding to use that instead of an iterator starting at 0
22:40karolherbst: hence me mentioning the gaps this can cause (like what if you got ssbos at 2, 6 and 14) and then 4 images
22:41anholt: oh, we get num_ssbos = 1
22:41karolherbst: but I guess num_ssbos would be 3 and we would just have the gmem array populated correctly,
22:41anholt: it's not about gaps, it's that num_ssbos is not max binding + 1
22:41karolherbst: anholt: which is correct :)
22:42karolherbst: the gaps might just be an issue if we run out of space, but if images point into the gmem table directly that shouldn't be an issue if we offset by num_ssbos
22:42karolherbst: which I think would be correct
22:43karolherbst: I am just not quite sure how that affects ssbos if we leave those indicies alone
22:43anholt: num_ssbos being a misleading number is also a problem for nir_lower_amul
22:43karolherbst: why is it misleading?
22:43karolherbst: there is one ssbo, it just has an explicit binding
22:43anholt: it's not "the highest location used by your lowered intrinsics plus one"
22:43anholt: so it's not a number you can use for anything
22:44karolherbst: I am sure that was never the idea of num_ssbos
22:44karolherbst: ssbos can have explicit bindings
22:44karolherbst: yeah.. the value might be useless in this case, that's correct
22:45karolherbst: I just don't know if we need to fix up ssbo accesses
22:45karolherbst: but probably yes, because they'll point into the gmem table
22:45karolherbst: which means we'd also need to fixup images and a plain offset might not be enough
22:48karolherbst: anyway.. it would be easier if we would have a way to specify a "driver_location" for ssbos and images and just use that, and populate the gmem array with gmem[driver_location] = .slot: var->data.binding
22:50CodeAgain: Regarding the message I sent earlier about SLI, I'm writing some kernel patches lately and I do have a laptop with a nvidia card (in a multi-card setup, but not a multi-nvidia setup)... I was thinking if I could do some minor patches for the driver without knowing much (honestly nothing at all) about graphic cards LOL
22:51karolherbst: CodeAgain: in this case it might be a bit problematic, but code clean ups are usually a good idea for newcomers, or.. dunno.. try to fix annoying bugs or something
22:52CodeAgain: karolherbst, I see, yeah, I can do some cleanups, sounds good!
22:52CodeAgain: There is something annoying me on my card, but I don't think it could be fixed in the driver LOL
22:53karolherbst: you never know
22:54karolherbst: if it's hit by all the identical cards, one could add a workaround based on PCI IDs
22:58anholt: karolherbst: wait. the intrinsic uses the .driver_location, not the .binding. why do you think the ssbo var's .binding is relevant in, say, dEQP-GLES31.functional.compute.basic.ubo_to_ssbo_single_group
22:59karolherbst: anholt: because the binding is set to a non 0 value
22:59karolherbst: but yeah... normally driver_location should be the correct value
22:59anholt: the intrinsic doesn't use .binding. the nouveau nir frontend doesn't use .binding.
23:00anholt: remapping bindings to driver locations is handled by mesa/st
23:00anholt: (these statements I made are about ssbos, not about images, though!)
23:01karolherbst: mhh yeah.. normally I'd expect that all of this should just work, especially given, that TGSI also uses 0 here
23:02karolherbst: maybe something else is broken
23:09karolherbst: I am a little worried my gt215 is diying :(
23:11karolherbst: no idea what's wrong there
23:13karolherbst: could be some mistake in the shader
23:24karolherbst: imirkin: do ubo indirects need to be inside FILE_ADDRESS?
23:31anholt: yeah, just adding this to load_ubo seems to help:
23:31anholt: if (indirectOffset)
23:31anholt: indirectOffset = mkOp1v(OP_MOV, TYPE_U32, getSSA(4, FILE_ADDRESS), indirectOffset);
23:31karolherbst: .. yeah :D
23:31karolherbst: ehh https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/b3aff34499f6a32ff595722a65055db5eee78920
23:32karolherbst: I think I need to tidy up that FILE_ADDRESS part at some point, but... I would also be fine with emiting it for all indirects and clean it up later in cases it's not needed.. mhh
23:32anholt: dubious of that version,
23:32anholt: since that getIndirect() is called twice in an intrinsic impl
23:32karolherbst: oh well
23:32anholt: so presumably one of the indirects should be in the address.
23:33karolherbst: I don't know if FILE_ADDRESS is only needed for the offset or not though
23:33karolherbst: but maybe it makes sense to emit FILE_ADDRESS more targeted
23:33karolherbst: I let CI decide
23:38anholt: ok, MR updated again
23:38anholt: down to just the sampler cube/2darray shadow crashes
23:39karolherbst: mhhhh, yeah..
23:40karolherbst: what's actually happening there...
23:41karolherbst: ohhh... the heck
23:43karolherbst: TGSI: TEX
23:43karolherbst: nir: TXL
23:44anholt: nir tex lowering likes to put a lod 0 in on non-FS stages because many drivers want that.
23:44karolherbst: I guess we don't
23:44anholt: I can sort that out easy enough
23:44karolherbst: we only call lower_tex with lower_txp
23:45karolherbst: yeah, ... we can encode the lod directly into the tex ops
23:45karolherbst: well.. some lods
23:47karolherbst: anholt: yeah soo, what happens is that texlod needs to be lowered for 3D stuff as it seems (or maybe other combinations), so knowing we have a lod0 helps us not doing that :)
23:48karolherbst: but despite that, the code should compile.. so let me see if this is just some silly bug somehwere
23:49anholt: ok. I'll sort out the nir side, anyway.
23:49karolherbst: yeah, should help with performance or something :)
23:49karolherbst: compSize is... 10?
23:50karolherbst: huh merge - %r213 %r205 %r206 %r129 %r209 %r110 (0)
23:50karolherbst: ra is a little weird here
23:50karolherbst: ahh yeah..
23:50karolherbst: it tries to merge 6 values into a quad for the tex op :)
23:51karolherbst: (I still hate, that we do this tex stuff as part of RA)
23:52karolherbst: ohh no, we try to merge 5 values into a quad
23:53karolherbst: three cords
23:53karolherbst: the comperator and the lod
23:56karolherbst: that's for imirkin to figure out :D
23:56karolherbst: although that combination might simply be not legal or something
23:56karolherbst: or a patch never tested
23:57karolherbst: or I place the stuff in the wrong order
23:58karolherbst: now that looks a bit better, but still fails RA
23:59karolherbst:pushes it onto his todo list