04:04 pmoreau_: karolherbst: that was on top of your branch, correct.
09:50 JayFoxRox: is https://github.com/mesa3d/mesa/blob/39d59cf87a3974142cb69dd52386d96b5e6e7dd9/src/mesa/drivers/dri/nouveau/nv10_state_tnl.c#L241 documented anywhere?
09:50 JayFoxRox: it's full of magic LUTs and values, but there's absolutely no description where it came from or what this math does / what the hw will do
09:50 JayFoxRox: I'm working on a non-nouveau driver (and emulator) for original Xbox [nv2a] and the MS / nvidia D3D driver actually has different LUTs and formulas
09:50 JayFoxRox: (however, the MS / nvidia driver also does the phi / theta thing for D3D instead of cutoff / exponent like GL) - I don't understand that code either
09:50 JayFoxRox: the D3D code is shorter and appears to have 32 entries in their LUT; I think they use the same LUT for what are `nv10_shininess_param` and `nv10_spot_params` in nouveau
10:04 AndrewR: pmoreau_, I compiled your branch (https://gitlab.freedesktop.org/pmoreau/mesa/-/tree/nv50_opencl_1.0_conformance) and tried LuxCore demo on it :} it failed, but at least for now it comes deeper into its path: https://pastebin.com/wXZyc73D
10:35 pmoreau_: Thanks for testing, AndrewR! I’ll have a look at that pastebin later.
10:38 karolherbst: uhhhh
10:38 karolherbst: spirv-val being stupid?
10:38 karolherbst: ohh wait
10:38 karolherbst: that's me
10:39 karolherbst: heh
10:39 karolherbst: AndrewR: might dumping the OpenCL C file?
10:39 AndrewR: karolherbst, I already changed mesa build for clover_support_hmm_wip_with_nv50 , now error a bit different
10:40 karolherbst: ahhh
10:40 karolherbst: yeah.. just wanted to say that this error doesn't make much sense :)
10:41 AndrewR: karolherbst, https://pastebin.com/5rxeiRqW - probably memory limits set in this version helped .... to some degree
10:42 karolherbst: AndrewR: huh? I don't see an error
10:43 karolherbst: ohh, it segfaults
10:43 karolherbst: I see :D
10:43 AndrewR: karolherbst, it disappeared with different branch (but it has some more commits from you)
10:44 AndrewR: karolherbst, https://pastebin.com/U3xLsQ9k gdb
10:45 karolherbst: yeah well...
10:45 karolherbst: no debug info
10:45 AndrewR: karolherbst, probably my MESON line installed stripped libs?
10:45 karolherbst: yeah
10:46 AndrewR: karolherbst, rebuilding withoyut 'strip' argument .... sorry, was copying arg line from my normal build
11:14 AndrewR: karolherbst, https://pastebin.com/hzgEfBTK - hopefully more interesting gdb
11:17 karolherbst: interesting
11:17 karolherbst: seems like the spirv linker messes up
11:17 karolherbst: probably not a mesa bug then
11:18 karolherbst: pmoreau_ might be able to help out here
11:18 pmoreau_: Was in a meeting; going to look at it now.
11:25 AndrewR: karolherbst, is there possibiliy it was diue to old vulkan-sdk? 9on Slackware siprv-tools folded into it). ? Rebuilding it now ....
11:26 pmoreau: I saw that “Invalid back or cross-edge in the CFG” error when running some of the CTS tests.
11:26 karolherbst: pmoreau: yeah.. doesn't matter, that's on top of master without the structurizer
11:27 karolherbst: AndrewR: could be
11:28 pmoreau: karolherbst: For me it was on top of clover_support_hmm_wip, IIRC.
11:28 pmoreau: Would have to double-check
11:28 karolherbst: huh weird
11:28 karolherbst: maybe you had an old version :p
11:28 pmoreau: Maybe
11:29 pmoreau: Looking at that linker error; looks like a segfault in the callback we pass to the spirv-tools.
11:29 karolherbst: ohhh
11:32 pmoreau: Though it does look like there is something weird with the SPIR-V the linker got, cause SPIRV-Tools was trying to print an error message for one of the instructions.
11:34 pmoreau: AndrewR: Would you have instructions on how I can reproduce your setup? (Mostly regarding the LuxCore bits.)
11:36 AndrewR: pmoreau, I just hacked their buiild system a bit ... commit 700751bdaf5a7d3fbff58a7130729cac559ad701 at https://github.com/LuxCoreRender/LuxCore and then as build with usual Cmake project (qith build dir, frrom where you run cmake ../). Moment, will pastebin hack on top of it for python3/numpy/boost thing
11:38 AndrewR: pmoreau, https://pastebin.com/EP5rj9RL
11:38 pmoreau: Thank you!
11:39 AndrewR: pmoreau, well, I was forced to rebuild boost on top of python 3.7, and then added numpy3 via pip3 install ....
11:39 AndrewR: pmoreau, boost-1.69.0-i586-3 because boost 1.7x broke bunch of stuff for me ....
11:42 AndrewR: pmoreau, and you need embree3, too (my idea was to hack it out too, for some non-x86 arch, but I don't finished this idea)
11:44 AndrewR: pmoreau, #define RTC_VERSION_STRING "3.9.0" from /usr/local/include/embree3/rtcore_config.h (I just dumped it there w/o any packaging)
11:46 AndrewR: pmoreau, ah, they have 3.10 for now... https://www.embree.org/
11:50 AndrewR: karolherbst, pmoreau it seem segfault is gone with vulkan-sdk- as from ftp://ftp.slackware.com/pub/slackware/slackware-current/source/x/vulkan-sdk - just minus wayland
11:51 AndrewR: https://pastebin.com/TLk0d9Vk
11:53 AndrewR: https://imagebin.ca/v/5M9ENdiIychj - probably not how it should look ....
11:57 pmoreau: AndrewR: Oh, good to know regarding the updated SDK. Which version of SPIRV-Tools were you using before, and which version are you using now? (You can get it by running `spirv-link --version`
11:58 AndrewR: https://pastebin.com/vBpvXgJ9 - luxcoreui after I switched it from default renderere to OCL path ...
11:58 pmoreau: And yes, I would agree with the results not looking right
11:59 pmoreau: “SPIR-V id 884 is the wrong kind of value” I don’t think I have seen that one before.
11:59 AndrewR: pmoreau, spirv-link --version
11:59 AndrewR: SPIRV-Tools v2020.1-dev unknown hash, 2020-05-11T11:29:41 - hm, it seems it just printed build time
11:59 AndrewR: pmoreau, I'll rebuild mesa fully just in case ....
11:59 pmoreau: That’s the version you are using right now?
12:00 AndrewR: well, i think yes? Those are packaged by slackbuild script, but I'm not sure if ninja rebuild/relinked everything correctly from dirty build dir
12:00 AndrewR: of mesa
12:01 pmoreau: I’ll build Lux as soon as the kernel is done building; can’t have too many builds going on at the same time on this old laptop.
12:07 AndrewR: pmoreau, https://github.com/LuxCoreRender/LuxCore/commit/659df583a3c36940e2cbc3698e1cdb896ea64dea - may be I just need update Lux too
12:09 pmoreau: I don’t think that would change much, but who knows.
12:44 AndrewR: pmoreau, https://pastebin.com/fWJ1Pmx9 - luxcoreui log (initially it renders black scene, and if I switch engine/Path from CPU to OCL ... this error appear)
12:46 pmoreau: Ah, back to the error you had earlier.
13:17 AndrewR: pmoreau, and kitchen scene produces another one: https://pastebin.com/qhmxamYa - unhandled opencl opc: 89
13:19 imirkin: JayFoxRox: best thing i can offer is to look at whether mwk RE'd it as part of his nvhw efforts
13:20 karolherbst: ohh
13:21 karolherbst: what's 89
13:21 karolherbst: Native_powr
13:21 karolherbst: :O
13:21 karolherbst: I should rebase on master
13:21 karolherbst: we have all natives in now
13:26 imirkin: JayFoxRox: hm ... doesn't seem like he has much beyond basic checks. however a grep suggests that R200 also has a LIGHT_SPOT_CUTOFF -- perhaps similar?
13:32 AndrewR: karolherbst, so I can just apply https://cgit.freedesktop.org/mesa/mesa/commit/?id=a698c2eedba8195a6486cfb3a2a61dd9fcfa31bb into branch ? (not sure if will actually make scene appear ...)
13:39 AndrewR: karolherbst, applied this and two before it ... recompiling
13:40 karolherbst: yeah
13:40 karolherbst: should just work
13:58 AndrewR: karolherbst, yes, it starts now .. But window with image still black :} (luxcoredemo generated also normal.png file, it was also black ...as opposed to color part I posted to imgbin - some error in building those normals?
13:58 karolherbst: dunno
13:59 karolherbst: hard to say what's up without debugging
14:01 AndrewR: karolherbst, ok (luxcorescenedemo also outputs few black pngs)
14:02 karolherbst: yeah.. but we have to fix our known CTS bugs first as well
14:02 karolherbst: although there could also be random errors in the logs
14:02 karolherbst: hard to tell
14:02 pmoreau: Trying to build LuxCore again using one of the AUR packages instead; I ran into some -fPIC issues earlier.
14:05 pmoreau: karolherbst: FYI, here are the basic tests that are crashing/asserting, and some quick description about the issue: https://pastebin.com/zgLQDMQc
14:18 AndrewR: karolherbst, pmoreau - there also some dmesg errors: https://pastebin.com/t1VdrYNt
14:19 karolherbst: pmoreau: yeah..
14:19 karolherbst: once I get the core stuff done :D
14:19 imirkin: CUDA_FAULT - Global read fault at address 0083dffffc
14:19 imirkin: that's probably not great.
14:19 karolherbst: mhh
14:19 pmoreau: No worries, Karol :-)
14:19 karolherbst: sounds like OOB stuff
14:19 karolherbst: or null pointer access stuff
14:19 karolherbst: well
14:20 karolherbst: it will still take some time until we get there
14:21 pmoreau: I had to tweak nv50_ir_from_nir to get some read and writes to work. So I wouldn’t be surprised if I messed things up partly, or if more tweaking needs to be done.
14:23 pmoreau: Looking at my logs from yesterday's CTS run, I also have a bunch of “Global write faut” and some CP_NO_REG_SPACE_STRIPED
14:23 pmoreau: Oh, and one invalid opcode as well. :-D
14:24 imirkin: at least it prints the invalid opcode on nv50
14:24 imirkin: the compute opcode encoding stuff is not super-well-tested
14:24 pmoreau: Indeed: “TRAP_MP_EXEC - TP 0 MP 1: 00000010 [INVALID_OPCODE] at 000258 warp 7, opcode 2c00c019 04220784”
14:25 imirkin: looks like a shared STORE
14:26 imirkin: envydis decodes it thusly: 00000000: 2c00c019 04220784 add b32 $r6 b32 s[$a7] $r8
14:27 imirkin: unfortunately my version of nvdisasm no longer supports SM10
14:28 pmoreau: Mmh, does add accept a shared as first argument? Or is it only for the second one?
14:28 imirkin: oh wait, no, there it goes
14:28 imirkin: /*0000*/ IADD R6, g [A7+0x0], R8;
14:28 karolherbst: ohh
14:28 karolherbst: interesting
14:28 imirkin: sooo ... not COMPLETELY identical :)
14:28 pmoreau: :o
14:28 karolherbst: imirkin: did you add the CP shader type to envydis?
14:28 karolherbst: allthough
14:28 imirkin: of course
14:28 pmoreau: I mean, 's', 'g', who cares: it’s still a letter! ;-)
14:29 karolherbst: :D
14:29 imirkin: envydis -m g80 -V g84 -O cp -w
14:29 karolherbst: yeah..
14:29 karolherbst: I expect there to be a couple of encoding issues
14:29 karolherbst: but pmoreau wants to fix those :p
14:29 imirkin: the fact that they disagree is rather unfortunate.
14:30 pmoreau: I can not RE those though, but I can test fixes. :-D
14:30 imirkin: you can run stuff through nvdisasm
14:30 pmoreau: I should be able to do that
14:31 pmoreau: :-/ Still getting that -fPIC error.
14:35 imirkin: hm, interesting
14:36 imirkin: add b32 $r6 b32 g[$a7] $r8
14:36 imirkin: stdin:1.1-2.1: No match
14:36 imirkin: so i guess it's just backwards
14:36 imirkin: pmoreau: try to see what the original shader wanted btw
14:37 pmoreau: I’ll try to
14:38 imirkin: and then the next step is to figure out why hw doesn't like that op
14:38 imirkin: even though nvdisasm says it's fine
14:50 pmoreau: Found the test
14:50 pmoreau: Let’s see..
14:51 imirkin: g[$a7] is dodgy btw - address regs are only like 20-bit or something
14:58 pmoreau: I think I had tweaked Nouveau to avoid using the address regs.
15:02 pmoreau: Arf, I see a pattern between what codegen think it is generating, and what envydis tells me: all the `s[$rX+0xYZ]` end up as `s[$aW]` where `$aW` is never set.
15:02 pmoreau: For example, `ne $c0 ld u32 $r0 s[$r1+0x0]` becomes `(lg $c0) ld $r0 b32 s[$a2]`
15:06 imirkin: right
15:06 imirkin: probably less-than-ideal
15:07 imirkin: but i don't think s[$r1] is a thing
15:07 imirkin: i think you can only use $a with shared
15:07 imirkin: although looking at it, doesn't seem like g[$r1] is a thing either
15:07 imirkin: might be stuck with the address regs
15:09 pmoreau: g[$rX] did work for me AFAICR; I never used the address regs when I implemented my SPIR-V -> NV50 IR translator.
15:09 imirkin: ah ok
15:09 imirkin: maybe i just can't get envyas to synthesize it
15:09 pmoreau: Does this work? `d00f0401 a0c00280 (lg $c0) st b32 g15[$r2] $r0`
15:10 imirkin: yes
15:10 imirkin: but no ld, apparently
15:10 pmoreau: Might have been missing the number after the g
15:10 imirkin: oh yes
15:10 imirkin: i was
15:10 pmoreau: "d00f0c19 80c00780 ld b32 $r6 g15[$r6]"
15:10 imirkin: there it goes
15:10 imirkin: ld b32 $r0 g0[$r1]
15:10 imirkin: 0xd0000201,
15:10 imirkin: 0x80c00780,
15:11 pmoreau: \o/
15:11 imirkin: but no such match with s[$r1]
15:11 imirkin: well at least this is nice ... a logical explanation to what's going on
15:11 pmoreau: Indeed
15:12 pmoreau: But now I need to understand how to use address regs, instead of continuing to ignore them. :-D
15:12 imirkin: at the nv50 ir level ... pretty easy
15:12 imirkin: it's already done in all the graphics stages
15:12 imirkin: so you just need to re-enable that, and have it skip for gmem
15:14 pmoreau: Basically undo https://pastebin.com/7Yhbm1Zv
15:14 pmoreau: I think
15:15 imirkin: mmmmm
15:15 imirkin: well
15:15 imirkin: no
15:15 imirkin: you should never be getting indirection on 1
15:15 imirkin: only on 0
15:15 imirkin: i.e. when are you seeing src.isIndirect(1) ?
15:16 imirkin: anyways, that stuff seems completely wrong
15:16 pmoreau: Good question; I just copy/pasted that from my old translator, cause it seemed to help with loads when using Karol’s NIR->NV50 IR work.
15:16 karolherbst: imirkin: but global loads are two dimensions on tesla, no?
15:16 pmoreau: But I can’t remember why I did that
15:17 imirkin: karolherbst: never indirecton on second dim though
15:17 karolherbst: true...
15:17 karolherbst: weird
15:17 imirkin: pmoreau: i expect that your "else" case is fine. but it just needs to be a if compute && sym->inFile(FILE_SHADER_INPUT) -> redo it as shared
15:17 karolherbst: ohh
15:17 karolherbst: what about const buffers?
15:18 imirkin: but that has nothing to do with lowering to address registers
15:18 imirkin: karolherbst: no
15:18 imirkin: not on nv50
15:18 karolherbst: ohhh
15:18 imirkin: karolherbst: only case is in geom shaders.
15:18 imirkin: since you can indirect in an input array, and on vertex
15:18 karolherbst: pmoreau: mind sharing the nv50ir output when hitting this issue?
15:20 pmoreau: Sure
15:20 imirkin: (i think karolherbst means when that indirect thing happens, not for the register/address thing)
15:20 karolherbst: yeah
15:21 imirkin: (and i don't think it ever does, it's just bogus code ported over)
15:21 karolherbst: although I think this was just to fixx the offset when loading kernel inputs :)
15:21 karolherbst: the patch I mean
15:21 imirkin: exactly
15:21 imirkin: hence my comment above.
15:22 karolherbst: but I was under the assumption I fixed it.. partly
15:22 pmoreau: So you want to see how it looks like for a simple load without my tweak, right?
15:23 karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/33a9b9fce5a660f19a7a79b34bb414f264abd49b
15:23 karolherbst: there I already use FILE_SHADER_INPUT
15:23 karolherbst: and FILE_SHADER_INPUT should get lowered to shared + that offset
15:23 karolherbst: maybe we don't add the offset in the lowering though
15:24 pmoreau: IIRC the offset was never being added
15:24 karolherbst: okay
15:24 karolherbst: I guess we should move that into the lowering
15:24 pmoreau: One sec, double-checking that
15:25 karolherbst: makes more sense there than in the IR converters
15:25 pmoreau: Right, without the patch the offset is not being added
15:26 pmoreau: `10000001 0423c780 102a8005 00000003 d00f0005 a0c00781` without the patch, `10000801 4400c780 102a8005 00000003 d00f0005 a0c00781` with the patch
15:26 karolherbst: yeah...
15:26 karolherbst: but then we need to fix the offset :)
15:26 karolherbst: and probably remove the offset when fromTGSI adds it
15:26 karolherbst: I wouldn't mind fixing fromNIR, but it makes more sense to have it in the lowering
15:27 pmoreau: My change was in `NV50LoweringPreSSA::handleLOAD()`, not in your nir pass. :-)
15:27 karolherbst: yeah.. I know :)
15:27 pmoreau: Though, something better could probably be achieved
15:27 karolherbst: wait...
15:28 karolherbst: pmoreau: += info->prop.cp.inputOffset
15:28 pmoreau: What is `a[]` again?
15:28 karolherbst: input
15:28 pmoreau: Ok
15:28 karolherbst: check Converter::makeSym in fromTGSI
15:29 karolherbst: but that also sounds a bit messy
15:29 karolherbst: ohh wait.. tgsiFile
15:29 karolherbst: ehh
15:29 karolherbst: yeah, whtvr
15:29 karolherbst: also if type == compute and file == SHADER_IN, file = SHARED and offset += info->prop.cp.inputOffset
15:29 imirkin: isn't that what i said?
15:29 karolherbst: probably :)
15:31 karolherbst: uhm...
15:31 karolherbst: how big is the shared memory btw?
15:31 karolherbst: on tesla that is
15:31 karolherbst: I guess it's not very big
15:31 imirkin: smaller than the number of bits in an address reg
15:32 imirkin: not even getting to loading msm_dri
15:32 karolherbst: the kernel input needs to be at least 1kb big.. I guess we have that much space
15:32 imirkin: wrong window on that last comment
15:37 pmoreau: So your comment unless I misunderstood, was to switch to this: https://pastebin.com/cNZ7jyQ3 but that should be exactly the same as what I had before.
15:37 imirkin: pmoreau: why the if/else
15:37 imirkin: pmoreau: leave the old code alone
15:37 imirkin: pmoreau: and just add the if (...)
15:38 imirkin: and yes, that should be fine
15:50 karolherbst: pmoreau: also, please use info->prop.cp.inputOffset instead of the hardcoded 0x10
15:53 pmoreau: karolherbst: Changed
16:17 pmoreau: It does not seem like prop.cp.sharedOffset is ever set for nv50.
16:17 karolherbst: inputOffset
16:18 pmoreau: inputOffset is, but not sharedOffset
16:23 pmoreau: Loading kernel arguments needs to be offseted by inputOffset
16:23 pmoreau: but also all regular loads and stores to shared mem needs to be offseted by sharedOffset
16:23 pmoreau: To compensate for the kernel arguments being in shared.
16:24 karolherbst: mhhh, makes sense
16:24 karolherbst: I don't think we have that meta data actualy...
16:26 pmoreau: There is cp.sharedOffset, which is used in the TGSI frontend, but I don’t think we have the size of input arguments available.
16:26 karolherbst: mhhh
16:26 karolherbst: there are two things actually
16:26 karolherbst: input size for tesla and the shared memory buffers
16:26 karolherbst: but the latter are runtime variable
16:27 karolherbst: mhhh
16:27 karolherbst: pmoreau: well, that code is disabled in TGSI
16:28 pmoreau: Oh right
16:28 karolherbst: also, it's never set
16:28 karolherbst: just used
16:29 pmoreau: Correct, which is what I was referring to by “It does not seem like prop.cp.sharedOffset is ever set for nv50.”
16:29 karolherbst: shader_info should contain the input size
16:29 karolherbst: shader_info.cs is where I would put a field "input size" probably
16:29 karolherbst: let me check
16:30 karolherbst: and then we set that from inside clover
16:30 karolherbst: when doing the offset calculation
16:31 karolherbst: pmoreau: you see the loop inside spirv_to_nir?
16:31 karolherbst: just use that offset value and put it into shader_info.cs.input_size :)
16:31 karolherbst: offset after the loop should contain the size of all inputs
16:32 karolherbst: mhhhh
16:32 karolherbst: might need some alignment though
16:32 karolherbst: so we want to align it ourselves to 0x10 or something
16:33 pmoreau: One problem we could run into, is running out of shared memory: if the user assumes they can use all the shared memory available, but we run out due to the arguments being stored there.
16:33 karolherbst: pmoreau: we should advertise less then ;)
16:33 karolherbst: advertise 1k for the kernel input and the rest for shared mem
16:33 pmoreau: Also once we have loaded the arguments into registers, we no longer need to hold onto that shared mem.
16:33 karolherbst: or so
16:34 karolherbst: pmoreau: I am sure that's not significant
16:34 karolherbst: 1k isn't that much
16:34 karolherbst: anyway, we can do perf opts later :)
16:37 pmoreau: We are already advertising 16 KB, which is the minimum value for being OpenCL 1.0 compliant.
16:37 karolherbst: pmoreau: 16k for shared mem, right?
16:37 pmoreau: yes
16:37 karolherbst: how much do we have?
16:38 pmoreau: Looking it up
16:42 pmoreau: CUDA docs are referring to max 16 KB per SM, but that does not really say how much there actually is.
16:44 karolherbst: yeah...
16:44 pmoreau: Not sure if we have a way to read it out from the hardware itself.
16:44 karolherbst: mhhhhhh
16:44 karolherbst: wait..
16:44 karolherbst: we can check what nvidia does actually :D
16:45 pmoreau: Pfff, too easy
16:45 pmoreau: :-D
16:45 karolherbst: ohh, I still have my kernel to check how many in kernel constants there can be :D
16:45 RSpliet: Don't arguments go into constmem instead of local/shared mem?
16:46 karolherbst: pmoreau: ohh, maybe you know it: do you know if CL states that the amount of constants inside a kernel has to be within the const buffer limit?
16:46 karolherbst: RSpliet: tesla
16:47 RSpliet: but edison... do they not support constmem?
16:47 pmoreau: There is CL_​DEVICE_​MAX_​PARAMETER_​SIZE that gives you the “Max size in bytes of all arguments that can be passed to a kernel.”
16:47 pmoreau: No idea about the constants
16:48 karolherbst: ehhhh
16:48 karolherbst: I hate new nvidia drivers :D
16:49 pmoreau: Well, you would want to check against an old one, cause new ones no longer support Tesla. Or what are you trying to check?
16:49 karolherbst: yeah...
16:50 karolherbst: I compile cl to ptx and check what they do
16:50 karolherbst: but.. uff
16:51 karolherbst: ehhh
16:51 karolherbst: MOV R1, c[0x1][0x100]
16:51 karolherbst: why aren't we using const buffers? :O
16:52 pmoreau: I was looking at the wrong loop in the wrong spirv_to_nir :-D
16:52 karolherbst: c[0x0][0x20] used for the src ptr
16:52 RSpliet:sips on some tea
16:52 karolherbst: imirkin: so uhm.. why are we using shared for kernel inputs again?
16:52 pmoreau: Are you generating for Tesla? Or for some Fermi
16:53 karolherbst: SM10
16:53 karolherbst: I hink
16:53 karolherbst: maybe I messed up
16:53 karolherbst: let me check
16:54 karolherbst: ahh shit..
16:54 karolherbst: cuda too new
16:54 karolherbst: only sm20
16:54 imirkin: SM11
16:56 pmoreau: I think you need something like CUDA <= 5
16:57 imirkin: 9
16:58 karolherbst: pmoreau: 6.5 :)
16:58 pmoreau: :-D
16:58 karolherbst: nvidia has all those nice fedora repos
16:58 karolherbst: so it's easy to check
16:59 karolherbst: what the hell
16:59 pmoreau: I should have looked on AUR:
16:59 pmoreau: aur/cuda65 6.5.19-2 (1) (0.00)
16:59 pmoreau: NVIDIA's GPU programming toolkit (for compute capability < 2.0)
17:00 karolherbst: https://gist.github.com/karolherbst/c2c1cdc92c94ac9c725e8ae5d482af39
17:00 karolherbst: that might also explain that s vs g confusion
17:00 karolherbst: g [0x4], vs global14[R1];
17:01 karolherbst: this.. thing is weird :D
17:01 simernes: Hi. My desktop environment (KDE) keeps randomly freezing. Any suggestions on how to troubleshoot? I can still ssh into the machine, but I can't open new ttys with ctrl+alt+fx. It seems to maybe be correlated with jumping between tabs in firefox and similar gui-ish actions although I'm not sure about that. Sometimes it freezes just for a while and I'm lucky in that it wakes up from the freeze, but
17:01 simernes: other times not.
17:01 simernes: I'm on void linux with kernel 5.4.40, with GTX 650 and nouveau.
17:01 karolherbst: what is g[]?
17:02 karolherbst: simernes: killing firefox/chromium might solve it
17:02 karolherbst: firefox most likely
17:02 karolherbst: or other applications using the GPU
17:02 karolherbst: simernes: can you confirm that killing firefox thorugh ssh unfreezes it?
17:02 simernes: I will next time. You think it's out of memory related?
17:02 karolherbst: no
17:02 pmoreau: global14[] -> g14[] for us
17:03 karolherbst: pmoreau: IADD32 R1, g [0x4], R0; /* 0x2100e804 */
17:04 pmoreau: And g[] -> s[] for us, AFAICT
17:04 karolherbst: yeah..
17:04 karolherbst: but this is weird
17:04 pmoreau: It is weird
17:04 karolherbst: IADD32 R2, g [0x5], R0; /* 0x2100ea08 */
17:04 karolherbst: why that 0x5?
17:05 karolherbst: ohh
17:05 karolherbst: that's just nvdisasm being stupid
17:05 pmoreau: I’m thinking those could be accessing ntid & co
17:06 karolherbst: no
17:06 karolherbst: they just print in a stupid way
17:06 pmoreau: Cause s[0x0:0x10] is where ntid & co live on Tesla, which is why we start at 0x10
17:06 karolherbst: yeah
17:06 karolherbst: but the offsets are false
17:06 karolherbst: add b32 $r1 b32 s[0x10] $r0 and add b32 $r2 b32 s[0x14] $r0
17:07 karolherbst: [0x1].U16 -> 0x2
17:07 karolherbst: [0x4] -> 0x10
17:07 karolherbst: etc...
17:07 karolherbst: okay.. anyway
17:07 karolherbst: that helps
17:07 pmoreau: Ehhh :o
17:07 karolherbst: soo.. let's see what happens if we use local memory buffers
17:09 karolherbst: ohhhhhhh
17:09 karolherbst: ehhh
17:09 karolherbst: pmoreau: it's too easy :p
17:09 karolherbst: guess what they do
17:11 karolherbst: but I guess there is no inherent benefit of fixing up the runtime value vs offseting in the kernel
17:12 pmoreau: Do they just offset?
17:12 pmoreau: They read the input size from a constant buffer or something?
17:12 karolherbst: at runtime
17:12 karolherbst: yeah
17:12 karolherbst: no
17:13 karolherbst: they read the offset from the constant buffer
17:13 karolherbst: but...
17:13 karolherbst: you can apply the kernel input offset outside the shader
17:14 karolherbst: mhhh.. nvidia supported CL1.1 for tesla though
17:14 pmoreau: Really?
17:14 karolherbst: apparently yes..
17:14 karolherbst: dunno
17:14 pmoreau: OpenCL 1.1 mandated 32 KB of shared memory
17:14 karolherbst: maybe wikipedia is wrong though :D
17:14 karolherbst: no
17:14 karolherbst: 1.1 mandates 16kb
17:14 karolherbst: or was that 1.0?
17:14 karolherbst: ahh, 1.1 alreads went to 32k
17:15 karolherbst: mhhh
17:15 pmoreau: Yeah, I was hoping we could get 1.1 or maybe 1.2 for Tesla, but it does not look like it because of that.
17:16 karolherbst: pmoreau: do you know what the kernel input size is reported by nvidia?
17:16 pmoreau: I do not
18:15 pmoreau: This will do for now: https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/a417fac7bd6635d01e83dcc19bf576fd960db689
18:17 pmoreau: Got a few extra tests passing with this. Going to look into why I have so many tests crashing in vtn_variable_create() at `var->var->constant_initializer =`, due to var->var == NULL.
18:17 karolherbst: pmoreau: don't set inputOffset :D
18:17 karolherbst: nv50 already does so
18:18 karolherbst: ohh wait
18:18 karolherbst: I missread the diff
18:18 pmoreau: :-)
18:18 karolherbst: mhhh
18:18 karolherbst: I'd set sharedOffset from within codegen though propably
18:19 karolherbst: sharedOffset is more like a result of the compilation
18:19 karolherbst: than anything else
18:19 karolherbst: and we need to be careful due to alignment and stuff
18:19 karolherbst: I think...
18:20 pmoreau: I could set it only for nv50 in nv50_ir_from_nir
18:20 pmoreau: I considered doing that
18:20 karolherbst: currently wondering
18:20 karolherbst: how do we want to use this value though?
18:20 karolherbst: static in kernel offsets?
18:20 karolherbst: ahh.. this sucks
18:22 karolherbst: is there any benefit of having shorter offsets?
18:22 karolherbst: like do we have a benefit of +0x4 over +0x244
18:22 karolherbst: *with
18:22 imirkin: afaik there's a max static offset size
18:23 karolherbst: would be good to know how high that is
18:23 imirkin: i forget, like 0x8000 i think
18:23 karolherbst: ehhh
18:23 imirkin: maybe i'm mixing up gens though
18:23 imirkin: should look at the encoding
18:23 karolherbst: we talk about shared mem here though
18:23 karolherbst: but I guess having no offset gives us a benefit in some cases?
18:24 karolherbst: pmoreau: from a feeling I would say passing the adjusted offset into the kernel gives us benefits over doing that at runtime with static offsets
18:25 karolherbst: but I don't know why exactly yet :)
18:25 pmoreau: I can also just do an iadd with the offset, to avoid running into issues with encoded offsets.
18:26 karolherbst: pmoreau: why not just add the value in the kernel offset buffer?
18:26 karolherbst: *kernel input buffer
18:26 karolherbst: besides clover not supporting it yet :p
18:27 pmoreau: :-D
18:27 pmoreau: It could be done
18:27 pmoreau: Not sure whether r600 will complain about it or not.
18:28 karolherbst: why would r600 care?
18:29 karolherbst: it's more or less a nv50 specific problem
18:29 karolherbst: just need to wire it up in gallium somehow
18:30 karolherbst: anyway, would like to know what the limitations and benefits are
18:30 pmoreau: Oh, you would only append it to the input buffer on the Nouveau-side.
18:30 pmoreau: Yeah, then r600 should not care.
18:31 karolherbst: well, we can't do it _inside_ nv50
18:31 pmoreau: I was thinking you would be doing something similar to what r600 did, where they just appended stuff to the input buffer.
18:31 karolherbst: it has to be done in clover, but we can do it through an interface where only nv50 cares about...
18:31 karolherbst: but.. mhhh
18:31 karolherbst: ahh.. no
18:31 karolherbst: that's missdesigned :p
18:42 pmoreau: i found the issue regarding var->var being NULL
18:43 pmoreau: It happens for a variable with storage class UniformConstant that has an initialiser.
18:44 pmoreau: vtn_storage_class_to_mode() ends up changing the mode to vtn_variable_mode_cross_workgroup (since options->constant_as_global is set), and no variable is allocated for that mode since “/* These don't need actual variables. */”.
18:45 karolherbst: pmoreau: right, but that's already in the work of being fixed
18:46 pmoreau: Ah okay, cool!
18:46 karolherbst: the issue is just more annoying
18:47 karolherbst: pmoreau: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/29 plus all discussions .p
18:47 karolherbst: well
18:47 karolherbst: some
18:47 karolherbst: the issue is if you have two indirection levels
18:47 karolherbst: and reference in kernel constants and constant buffers
18:47 karolherbst: but obviously not all drivers want to put all indirectly accessed constants into an ubo
18:48 karolherbst: annoying details
18:49 pmoreau: I see; annoying indeed
18:54 pmoreau: Mmh, are inlined offsets non-existent on Tesla for global memory?
18:55 pmoreau: Let’s see what TGSI does
18:59 imirkin: might not be
19:20 pmoreau: We do not have a pass to split 64-bit adds on Tesla? That seems weird. Or maybe no 64-bit ops are exposed in GLSL on Tesla?
19:22 pmoreau: I like when the emitter is like “sure, here you go: `EMIT: add s64 $r0d $r2d $r0d`”, and then envydis is like “nope: `add b32 $r0 $r2 $r0`”.
19:23 karolherbst: pmoreau: huh.. weird :D
19:23 karolherbst: but yeah, maybe we don't write it up
19:24 karolherbst: PIPE_CAP_INT64: return 0
19:24 karolherbst: so... yeah
19:26 pmoreau: Welp, time to support those then.
19:26 karolherbst: :)
19:26 karolherbst: have fun
19:26 karolherbst: pmoreau: btw, you can already submit MRs for more or less trivial patches so that you don't have to carry too much locally
19:27 pmoreau: Yeah, I should
19:29 pmoreau: I was thinking of submitting https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/bc055ea0bdeb695af1a77d8d5a3282b7526da4ca and https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/54f5d3b95bcb3c5ad8fc6da3c6410d3a6b04734d
19:29 pmoreau: But I should also do for the nv50 patches.
19:29 karolherbst: ohh, yeah, seems useful
19:33 AndrewR: pmoreau, I think latest commit at clover_support_hmm_wip_with_nv50 started to s4egfault luxcoreui, but may be this is expected ...
19:33 pmoreau: Where does it segfault?
19:36 AndrewR: pmoreau, https://pastebin.com/TAyYkACF
19:36 pmoreau: Thank you!
19:37 pmoreau: Mmh, so it is segfaulting in the nir validator.
19:39 pmoreau: I don’t see how my changes could make it crash since I only made changes that happen after that has run (except for one change, but it does not seem to be the culprit either).
19:40 AndrewR: pmoreau, I'll try to recompile again (may be not as debugoptimized, but as debug?)
19:40 pmoreau: So I am guessing that it progresses further than before, and now hits a bug in NIR.
19:41 AndrewR: pmoreau, previous commit was showing black screen in luxcoreui, but it appeared to compute something ....
19:54 AndrewR: pmoreau, previous version still work .. (as in, just window with menus and statistics, no actual image)
19:56 pmoreau: I can’t see what would cause that, besides that before it was failing but fell back to some other path.
19:57 pmoreau: But now that something is working better, it no longer is using that fallback path and is hitting a new bug.
19:57 pmoreau: karolherbst: Do you see anything in my patch that could end up causing nir_validate to segfault?
19:58 karolherbst: it is more likely that there are random bugs :p
19:58 pmoreau: That could be too :-D
19:58 pmoreau: The best kind of bugs!
20:06 AndrewR: pmoreau, https://pastebin.pl/view/7ef0ff16 (pastebin said I reached my limit of 10 pastes for 24h!)
20:08 pmoreau: Same reason as in your previous pastebin: somehow `validate_var_decl()` is calling `glsl_without_array()` with a NULL pointer.
20:09 pmoreau: Could be good to investigate what is going on at frame #3, try to figure out where this NULL pointer is coming from.
20:27 AndrewR: pmoreau, I added some printfs: https://pastebin.pl/view/f71cc87c
20:32 AndrewR: but I probably will retract myself to bed ... good night, and thanks for your work!
20:36 pmoreau: Good night! I should go to bed too, and have dinner :-D
20:37 pmoreau: Why I am doing all this manual lowering of 64-bit operations, when I can change ask NIR to do it for me. ¯\_(ツ)_/¯
20:48 pmoreau: How many registers can be accessed on Tesla? Hitting CP_NO_REG_SPACE_STRIPED when using only 17 regs seems surprising.
22:30 karolherbst: pmoreau: uhhhh
22:31 karolherbst: maybe some thread misconfiguration happens as well
22:31 karolherbst: with compute all of that is a bit more explicit probably