04:04pmoreau_: karolherbst: that was on top of your branch, correct.
09:50JayFoxRox: is https://github.com/mesa3d/mesa/blob/39d59cf87a3974142cb69dd52386d96b5e6e7dd9/src/mesa/drivers/dri/nouveau/nv10_state_tnl.c#L241 documented anywhere?
09:50JayFoxRox: it's full of magic LUTs and values, but there's absolutely no description where it came from or what this math does / what the hw will do
09:50JayFoxRox: I'm working on a non-nouveau driver (and emulator) for original Xbox [nv2a] and the MS / nvidia D3D driver actually has different LUTs and formulas
09:50JayFoxRox: (however, the MS / nvidia driver also does the phi / theta thing for D3D instead of cutoff / exponent like GL) - I don't understand that code either
09:50JayFoxRox: the D3D code is shorter and appears to have 32 entries in their LUT; I think they use the same LUT for what are `nv10_shininess_param` and `nv10_spot_params` in nouveau
10:04AndrewR: pmoreau_, I compiled your branch (https://gitlab.freedesktop.org/pmoreau/mesa/-/tree/nv50_opencl_1.0_conformance) and tried LuxCore demo on it :} it failed, but at least for now it comes deeper into its path: https://pastebin.com/wXZyc73D
10:35pmoreau_: Thanks for testing, AndrewR! I’ll have a look at that pastebin later.
10:38karolherbst: spirv-val being stupid?
10:38karolherbst: ohh wait
10:38karolherbst: that's me
10:39karolherbst: AndrewR: might dumping the OpenCL C file?
10:39AndrewR: karolherbst, I already changed mesa build for clover_support_hmm_wip_with_nv50 , now error a bit different
10:40karolherbst: yeah.. just wanted to say that this error doesn't make much sense :)
10:41AndrewR: karolherbst, https://pastebin.com/5rxeiRqW - probably memory limits set in this version helped .... to some degree
10:42karolherbst: AndrewR: huh? I don't see an error
10:43karolherbst: ohh, it segfaults
10:43karolherbst: I see :D
10:43AndrewR: karolherbst, it disappeared with different branch (but it has some more commits from you)
10:44AndrewR: karolherbst, https://pastebin.com/U3xLsQ9k gdb
10:45karolherbst: yeah well...
10:45karolherbst: no debug info
10:45AndrewR: karolherbst, probably my MESON line installed stripped libs?
10:46AndrewR: karolherbst, rebuilding withoyut 'strip' argument .... sorry, was copying arg line from my normal build
11:14AndrewR: karolherbst, https://pastebin.com/hzgEfBTK - hopefully more interesting gdb
11:17karolherbst: seems like the spirv linker messes up
11:17karolherbst: probably not a mesa bug then
11:18karolherbst: pmoreau_ might be able to help out here
11:18pmoreau_: Was in a meeting; going to look at it now.
11:25AndrewR: karolherbst, is there possibiliy it was diue to old vulkan-sdk? 9on Slackware siprv-tools folded into it). ? Rebuilding it now ....
11:26pmoreau: I saw that “Invalid back or cross-edge in the CFG” error when running some of the CTS tests.
11:26karolherbst: pmoreau: yeah.. doesn't matter, that's on top of master without the structurizer
11:27karolherbst: AndrewR: could be
11:28pmoreau: karolherbst: For me it was on top of clover_support_hmm_wip, IIRC.
11:28pmoreau: Would have to double-check
11:28karolherbst: huh weird
11:28karolherbst: maybe you had an old version :p
11:29pmoreau: Looking at that linker error; looks like a segfault in the callback we pass to the spirv-tools.
11:32pmoreau: Though it does look like there is something weird with the SPIR-V the linker got, cause SPIRV-Tools was trying to print an error message for one of the instructions.
11:34pmoreau: AndrewR: Would you have instructions on how I can reproduce your setup? (Mostly regarding the LuxCore bits.)
11:36AndrewR: pmoreau, I just hacked their buiild system a bit ... commit 700751bdaf5a7d3fbff58a7130729cac559ad701 at https://github.com/LuxCoreRender/LuxCore and then as build with usual Cmake project (qith build dir, frrom where you run cmake ../). Moment, will pastebin hack on top of it for python3/numpy/boost thing
11:38AndrewR: pmoreau, https://pastebin.com/EP5rj9RL
11:38pmoreau: Thank you!
11:39AndrewR: pmoreau, well, I was forced to rebuild boost on top of python 3.7, and then added numpy3 via pip3 install ....
11:39AndrewR: pmoreau, boost-1.69.0-i586-3 because boost 1.7x broke bunch of stuff for me ....
11:42AndrewR: pmoreau, and you need embree3, too (my idea was to hack it out too, for some non-x86 arch, but I don't finished this idea)
11:44AndrewR: pmoreau, #define RTC_VERSION_STRING "3.9.0" from /usr/local/include/embree3/rtcore_config.h (I just dumped it there w/o any packaging)
11:46AndrewR: pmoreau, ah, they have 3.10 for now... https://www.embree.org/
11:50AndrewR: karolherbst, pmoreau it seem segfault is gone with vulkan-sdk-22.214.171.124-i586-1 as from ftp://ftp.slackware.com/pub/slackware/slackware-current/source/x/vulkan-sdk - just minus wayland
11:53AndrewR: https://imagebin.ca/v/5M9ENdiIychj - probably not how it should look ....
11:57pmoreau: AndrewR: Oh, good to know regarding the updated SDK. Which version of SPIRV-Tools were you using before, and which version are you using now? (You can get it by running `spirv-link --version`
11:58AndrewR: https://pastebin.com/vBpvXgJ9 - luxcoreui after I switched it from default renderere to OCL path ...
11:58pmoreau: And yes, I would agree with the results not looking right
11:59pmoreau: “SPIR-V id 884 is the wrong kind of value” I don’t think I have seen that one before.
11:59AndrewR: pmoreau, spirv-link --version
11:59AndrewR: SPIRV-Tools v2020.1-dev unknown hash, 2020-05-11T11:29:41 - hm, it seems it just printed build time
11:59AndrewR: pmoreau, I'll rebuild mesa fully just in case ....
11:59pmoreau: That’s the version you are using right now?
12:00AndrewR: well, i think yes? Those are packaged by slackbuild script, but I'm not sure if ninja rebuild/relinked everything correctly from dirty build dir
12:00AndrewR: of mesa
12:01pmoreau: I’ll build Lux as soon as the kernel is done building; can’t have too many builds going on at the same time on this old laptop.
12:07AndrewR: pmoreau, https://github.com/LuxCoreRender/LuxCore/commit/659df583a3c36940e2cbc3698e1cdb896ea64dea - may be I just need update Lux too
12:09pmoreau: I don’t think that would change much, but who knows.
12:44AndrewR: pmoreau, https://pastebin.com/fWJ1Pmx9 - luxcoreui log (initially it renders black scene, and if I switch engine/Path from CPU to OCL ... this error appear)
12:46pmoreau: Ah, back to the error you had earlier.
13:17AndrewR: pmoreau, and kitchen scene produces another one: https://pastebin.com/qhmxamYa - unhandled opencl opc: 89
13:19imirkin: JayFoxRox: best thing i can offer is to look at whether mwk RE'd it as part of his nvhw efforts
13:21karolherbst: what's 89
13:21karolherbst: I should rebase on master
13:21karolherbst: we have all natives in now
13:26imirkin: JayFoxRox: hm ... doesn't seem like he has much beyond basic checks. however a grep suggests that R200 also has a LIGHT_SPOT_CUTOFF -- perhaps similar?
13:32AndrewR: karolherbst, so I can just apply https://cgit.freedesktop.org/mesa/mesa/commit/?id=a698c2eedba8195a6486cfb3a2a61dd9fcfa31bb into branch ? (not sure if will actually make scene appear ...)
13:39AndrewR: karolherbst, applied this and two before it ... recompiling
13:40karolherbst: should just work
13:58AndrewR: karolherbst, yes, it starts now .. But window with image still black :} (luxcoredemo generated also normal.png file, it was also black ...as opposed to color part I posted to imgbin - some error in building those normals?
13:59karolherbst: hard to say what's up without debugging
14:01AndrewR: karolherbst, ok (luxcorescenedemo also outputs few black pngs)
14:02karolherbst: yeah.. but we have to fix our known CTS bugs first as well
14:02karolherbst: although there could also be random errors in the logs
14:02karolherbst: hard to tell
14:02pmoreau: Trying to build LuxCore again using one of the AUR packages instead; I ran into some -fPIC issues earlier.
14:05pmoreau: karolherbst: FYI, here are the basic tests that are crashing/asserting, and some quick description about the issue: https://pastebin.com/zgLQDMQc
14:18AndrewR: karolherbst, pmoreau - there also some dmesg errors: https://pastebin.com/t1VdrYNt
14:19karolherbst: pmoreau: yeah..
14:19karolherbst: once I get the core stuff done :D
14:19imirkin: CUDA_FAULT - Global read fault at address 0083dffffc
14:19imirkin: that's probably not great.
14:19pmoreau: No worries, Karol :-)
14:19karolherbst: sounds like OOB stuff
14:19karolherbst: or null pointer access stuff
14:20karolherbst: it will still take some time until we get there
14:21pmoreau: I had to tweak nv50_ir_from_nir to get some read and writes to work. So I wouldn’t be surprised if I messed things up partly, or if more tweaking needs to be done.
14:23pmoreau: Looking at my logs from yesterday's CTS run, I also have a bunch of “Global write faut” and some CP_NO_REG_SPACE_STRIPED
14:23pmoreau: Oh, and one invalid opcode as well. :-D
14:24imirkin: at least it prints the invalid opcode on nv50
14:24imirkin: the compute opcode encoding stuff is not super-well-tested
14:24pmoreau: Indeed: “TRAP_MP_EXEC - TP 0 MP 1: 00000010 [INVALID_OPCODE] at 000258 warp 7, opcode 2c00c019 04220784”
14:25imirkin: looks like a shared STORE
14:26imirkin: envydis decodes it thusly: 00000000: 2c00c019 04220784 add b32 $r6 b32 s[$a7] $r8
14:27imirkin: unfortunately my version of nvdisasm no longer supports SM10
14:28pmoreau: Mmh, does add accept a shared as first argument? Or is it only for the second one?
14:28imirkin: oh wait, no, there it goes
14:28imirkin: /*0000*/ IADD R6, g [A7+0x0], R8;
14:28imirkin: sooo ... not COMPLETELY identical :)
14:28karolherbst: imirkin: did you add the CP shader type to envydis?
14:28imirkin: of course
14:28pmoreau: I mean, 's', 'g', who cares: it’s still a letter! ;-)
14:29imirkin: envydis -m g80 -V g84 -O cp -w
14:29karolherbst: I expect there to be a couple of encoding issues
14:29karolherbst: but pmoreau wants to fix those :p
14:29imirkin: the fact that they disagree is rather unfortunate.
14:30pmoreau: I can not RE those though, but I can test fixes. :-D
14:30imirkin: you can run stuff through nvdisasm
14:30pmoreau: I should be able to do that
14:31pmoreau: :-/ Still getting that -fPIC error.
14:35imirkin: hm, interesting
14:36imirkin: add b32 $r6 b32 g[$a7] $r8
14:36imirkin: stdin:1.1-2.1: No match
14:36imirkin: so i guess it's just backwards
14:36imirkin: pmoreau: try to see what the original shader wanted btw
14:37pmoreau: I’ll try to
14:38imirkin: and then the next step is to figure out why hw doesn't like that op
14:38imirkin: even though nvdisasm says it's fine
14:50pmoreau: Found the test
14:50pmoreau: Let’s see..
14:51imirkin: g[$a7] is dodgy btw - address regs are only like 20-bit or something
14:58pmoreau: I think I had tweaked Nouveau to avoid using the address regs.
15:02pmoreau: Arf, I see a pattern between what codegen think it is generating, and what envydis tells me: all the `s[$rX+0xYZ]` end up as `s[$aW]` where `$aW` is never set.
15:02pmoreau: For example, `ne $c0 ld u32 $r0 s[$r1+0x0]` becomes `(lg $c0) ld $r0 b32 s[$a2]`
15:06imirkin: probably less-than-ideal
15:07imirkin: but i don't think s[$r1] is a thing
15:07imirkin: i think you can only use $a with shared
15:07imirkin: although looking at it, doesn't seem like g[$r1] is a thing either
15:07imirkin: might be stuck with the address regs
15:09pmoreau: g[$rX] did work for me AFAICR; I never used the address regs when I implemented my SPIR-V -> NV50 IR translator.
15:09imirkin: ah ok
15:09imirkin: maybe i just can't get envyas to synthesize it
15:09pmoreau: Does this work? `d00f0401 a0c00280 (lg $c0) st b32 g15[$r2] $r0`
15:10imirkin: but no ld, apparently
15:10pmoreau: Might have been missing the number after the g
15:10imirkin: oh yes
15:10imirkin: i was
15:10pmoreau: "d00f0c19 80c00780 ld b32 $r6 g15[$r6]"
15:10imirkin: there it goes
15:10imirkin: ld b32 $r0 g0[$r1]
15:11imirkin: but no such match with s[$r1]
15:11imirkin: well at least this is nice ... a logical explanation to what's going on
15:12pmoreau: But now I need to understand how to use address regs, instead of continuing to ignore them. :-D
15:12imirkin: at the nv50 ir level ... pretty easy
15:12imirkin: it's already done in all the graphics stages
15:12imirkin: so you just need to re-enable that, and have it skip for gmem
15:14pmoreau: Basically undo https://pastebin.com/7Yhbm1Zv
15:14pmoreau: I think
15:15imirkin: you should never be getting indirection on 1
15:15imirkin: only on 0
15:15imirkin: i.e. when are you seeing src.isIndirect(1) ?
15:16imirkin: anyways, that stuff seems completely wrong
15:16pmoreau: Good question; I just copy/pasted that from my old translator, cause it seemed to help with loads when using Karol’s NIR->NV50 IR work.
15:16karolherbst: imirkin: but global loads are two dimensions on tesla, no?
15:16pmoreau: But I can’t remember why I did that
15:17imirkin: karolherbst: never indirecton on second dim though
15:17imirkin: pmoreau: i expect that your "else" case is fine. but it just needs to be a if compute && sym->inFile(FILE_SHADER_INPUT) -> redo it as shared
15:17karolherbst: what about const buffers?
15:18imirkin: but that has nothing to do with lowering to address registers
15:18imirkin: karolherbst: no
15:18imirkin: not on nv50
15:18imirkin: karolherbst: only case is in geom shaders.
15:18imirkin: since you can indirect in an input array, and on vertex
15:18karolherbst: pmoreau: mind sharing the nv50ir output when hitting this issue?
15:20imirkin: (i think karolherbst means when that indirect thing happens, not for the register/address thing)
15:21imirkin: (and i don't think it ever does, it's just bogus code ported over)
15:21karolherbst: although I think this was just to fixx the offset when loading kernel inputs :)
15:21karolherbst: the patch I mean
15:21imirkin: hence my comment above.
15:22karolherbst: but I was under the assumption I fixed it.. partly
15:22pmoreau: So you want to see how it looks like for a simple load without my tweak, right?
15:23karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/33a9b9fce5a660f19a7a79b34bb414f264abd49b
15:23karolherbst: there I already use FILE_SHADER_INPUT
15:23karolherbst: and FILE_SHADER_INPUT should get lowered to shared + that offset
15:23karolherbst: maybe we don't add the offset in the lowering though
15:24pmoreau: IIRC the offset was never being added
15:24karolherbst: I guess we should move that into the lowering
15:24pmoreau: One sec, double-checking that
15:25karolherbst: makes more sense there than in the IR converters
15:25pmoreau: Right, without the patch the offset is not being added
15:26pmoreau: `10000001 0423c780 102a8005 00000003 d00f0005 a0c00781` without the patch, `10000801 4400c780 102a8005 00000003 d00f0005 a0c00781` with the patch
15:26karolherbst: but then we need to fix the offset :)
15:26karolherbst: and probably remove the offset when fromTGSI adds it
15:26karolherbst: I wouldn't mind fixing fromNIR, but it makes more sense to have it in the lowering
15:27pmoreau: My change was in `NV50LoweringPreSSA::handleLOAD()`, not in your nir pass. :-)
15:27karolherbst: yeah.. I know :)
15:27pmoreau: Though, something better could probably be achieved
15:28karolherbst: pmoreau: += info->prop.cp.inputOffset
15:28pmoreau: What is `a` again?
15:28karolherbst: check Converter::makeSym in fromTGSI
15:29karolherbst: but that also sounds a bit messy
15:29karolherbst: ohh wait.. tgsiFile
15:29karolherbst: yeah, whtvr
15:29karolherbst: also if type == compute and file == SHADER_IN, file = SHARED and offset += info->prop.cp.inputOffset
15:29imirkin: isn't that what i said?
15:29karolherbst: probably :)
15:31karolherbst: how big is the shared memory btw?
15:31karolherbst: on tesla that is
15:31karolherbst: I guess it's not very big
15:31imirkin: smaller than the number of bits in an address reg
15:32imirkin: not even getting to loading msm_dri
15:32karolherbst: the kernel input needs to be at least 1kb big.. I guess we have that much space
15:32imirkin: wrong window on that last comment
15:37pmoreau: So your comment unless I misunderstood, was to switch to this: https://pastebin.com/cNZ7jyQ3 but that should be exactly the same as what I had before.
15:37imirkin: pmoreau: why the if/else
15:37imirkin: pmoreau: leave the old code alone
15:37imirkin: pmoreau: and just add the if (...)
15:38imirkin: and yes, that should be fine
15:50karolherbst: pmoreau: also, please use info->prop.cp.inputOffset instead of the hardcoded 0x10
15:53pmoreau: karolherbst: Changed
16:17pmoreau: It does not seem like prop.cp.sharedOffset is ever set for nv50.
16:18pmoreau: inputOffset is, but not sharedOffset
16:23pmoreau: Loading kernel arguments needs to be offseted by inputOffset
16:23pmoreau: but also all regular loads and stores to shared mem needs to be offseted by sharedOffset
16:23pmoreau: To compensate for the kernel arguments being in shared.
16:24karolherbst: mhhh, makes sense
16:24karolherbst: I don't think we have that meta data actualy...
16:26pmoreau: There is cp.sharedOffset, which is used in the TGSI frontend, but I don’t think we have the size of input arguments available.
16:26karolherbst: there are two things actually
16:26karolherbst: input size for tesla and the shared memory buffers
16:26karolherbst: but the latter are runtime variable
16:27karolherbst: pmoreau: well, that code is disabled in TGSI
16:28pmoreau: Oh right
16:28karolherbst: also, it's never set
16:28karolherbst: just used
16:29pmoreau: Correct, which is what I was referring to by “It does not seem like prop.cp.sharedOffset is ever set for nv50.”
16:29karolherbst: shader_info should contain the input size
16:29karolherbst: shader_info.cs is where I would put a field "input size" probably
16:29karolherbst: let me check
16:30karolherbst: and then we set that from inside clover
16:30karolherbst: when doing the offset calculation
16:31karolherbst: pmoreau: you see the loop inside spirv_to_nir?
16:31karolherbst: just use that offset value and put it into shader_info.cs.input_size :)
16:31karolherbst: offset after the loop should contain the size of all inputs
16:32karolherbst: might need some alignment though
16:32karolherbst: so we want to align it ourselves to 0x10 or something
16:33pmoreau: One problem we could run into, is running out of shared memory: if the user assumes they can use all the shared memory available, but we run out due to the arguments being stored there.
16:33karolherbst: pmoreau: we should advertise less then ;)
16:33karolherbst: advertise 1k for the kernel input and the rest for shared mem
16:33pmoreau: Also once we have loaded the arguments into registers, we no longer need to hold onto that shared mem.
16:33karolherbst: or so
16:34karolherbst: pmoreau: I am sure that's not significant
16:34karolherbst: 1k isn't that much
16:34karolherbst: anyway, we can do perf opts later :)
16:37pmoreau: We are already advertising 16 KB, which is the minimum value for being OpenCL 1.0 compliant.
16:37karolherbst: pmoreau: 16k for shared mem, right?
16:37karolherbst: how much do we have?
16:38pmoreau: Looking it up
16:42pmoreau: CUDA docs are referring to max 16 KB per SM, but that does not really say how much there actually is.
16:44pmoreau: Not sure if we have a way to read it out from the hardware itself.
16:44karolherbst: we can check what nvidia does actually :D
16:45pmoreau: Pfff, too easy
16:45karolherbst: ohh, I still have my kernel to check how many in kernel constants there can be :D
16:45RSpliet: Don't arguments go into constmem instead of local/shared mem?
16:46karolherbst: pmoreau: ohh, maybe you know it: do you know if CL states that the amount of constants inside a kernel has to be within the const buffer limit?
16:46karolherbst: RSpliet: tesla
16:47RSpliet: but edison... do they not support constmem?
16:47pmoreau: There is CL_DEVICE_MAX_PARAMETER_SIZE that gives you the “Max size in bytes of all arguments that can be passed to a kernel.”
16:47pmoreau: No idea about the constants
16:48karolherbst: I hate new nvidia drivers :D
16:49pmoreau: Well, you would want to check against an old one, cause new ones no longer support Tesla. Or what are you trying to check?
16:50karolherbst: I compile cl to ptx and check what they do
16:50karolherbst: but.. uff
16:51karolherbst: MOV R1, c[0x1][0x100]
16:51karolherbst: why aren't we using const buffers? :O
16:52pmoreau: I was looking at the wrong loop in the wrong spirv_to_nir :-D
16:52karolherbst: c[0x0][0x20] used for the src ptr
16:52RSpliet:sips on some tea
16:52karolherbst: imirkin: so uhm.. why are we using shared for kernel inputs again?
16:52pmoreau: Are you generating for Tesla? Or for some Fermi
16:53karolherbst: I hink
16:53karolherbst: maybe I messed up
16:53karolherbst: let me check
16:54karolherbst: ahh shit..
16:54karolherbst: cuda too new
16:54karolherbst: only sm20
16:56pmoreau: I think you need something like CUDA <= 5
16:58karolherbst: pmoreau: 6.5 :)
16:58karolherbst: nvidia has all those nice fedora repos
16:58karolherbst: so it's easy to check
16:59karolherbst: what the hell
16:59pmoreau: I should have looked on AUR:
16:59pmoreau: aur/cuda65 6.5.19-2 (1) (0.00)
16:59pmoreau: NVIDIA's GPU programming toolkit (for compute capability < 2.0)
17:00karolherbst: that might also explain that s vs g confusion
17:00karolherbst: g [0x4], vs global14[R1];
17:01karolherbst: this.. thing is weird :D
17:01simernes: Hi. My desktop environment (KDE) keeps randomly freezing. Any suggestions on how to troubleshoot? I can still ssh into the machine, but I can't open new ttys with ctrl+alt+fx. It seems to maybe be correlated with jumping between tabs in firefox and similar gui-ish actions although I'm not sure about that. Sometimes it freezes just for a while and I'm lucky in that it wakes up from the freeze, but
17:01simernes: other times not.
17:01simernes: I'm on void linux with kernel 5.4.40, with GTX 650 and nouveau.
17:01karolherbst: what is g?
17:02karolherbst: simernes: killing firefox/chromium might solve it
17:02karolherbst: firefox most likely
17:02karolherbst: or other applications using the GPU
17:02karolherbst: simernes: can you confirm that killing firefox thorugh ssh unfreezes it?
17:02simernes: I will next time. You think it's out of memory related?
17:02pmoreau: global14 -> g14 for us
17:03karolherbst: pmoreau: IADD32 R1, g [0x4], R0; /* 0x2100e804 */
17:04pmoreau: And g -> s for us, AFAICT
17:04karolherbst: but this is weird
17:04pmoreau: It is weird
17:04karolherbst: IADD32 R2, g [0x5], R0; /* 0x2100ea08 */
17:04karolherbst: why that 0x5?
17:05karolherbst: that's just nvdisasm being stupid
17:05pmoreau: I’m thinking those could be accessing ntid & co
17:06karolherbst: they just print in a stupid way
17:06pmoreau: Cause s[0x0:0x10] is where ntid & co live on Tesla, which is why we start at 0x10
17:06karolherbst: but the offsets are false
17:06karolherbst: add b32 $r1 b32 s[0x10] $r0 and add b32 $r2 b32 s[0x14] $r0
17:07karolherbst: [0x1].U16 -> 0x2
17:07karolherbst: [0x4] -> 0x10
17:07karolherbst: okay.. anyway
17:07karolherbst: that helps
17:07pmoreau: Ehhh :o
17:07karolherbst: soo.. let's see what happens if we use local memory buffers
17:09karolherbst: pmoreau: it's too easy :p
17:09karolherbst: guess what they do
17:11karolherbst: but I guess there is no inherent benefit of fixing up the runtime value vs offseting in the kernel
17:12pmoreau: Do they just offset?
17:12pmoreau: They read the input size from a constant buffer or something?
17:12karolherbst: at runtime
17:13karolherbst: they read the offset from the constant buffer
17:13karolherbst: you can apply the kernel input offset outside the shader
17:14karolherbst: mhhh.. nvidia supported CL1.1 for tesla though
17:14karolherbst: apparently yes..
17:14pmoreau: OpenCL 1.1 mandated 32 KB of shared memory
17:14karolherbst: maybe wikipedia is wrong though :D
17:14karolherbst: 1.1 mandates 16kb
17:14karolherbst: or was that 1.0?
17:14karolherbst: ahh, 1.1 alreads went to 32k
17:15pmoreau: Yeah, I was hoping we could get 1.1 or maybe 1.2 for Tesla, but it does not look like it because of that.
17:16karolherbst: pmoreau: do you know what the kernel input size is reported by nvidia?
17:16pmoreau: I do not
18:15pmoreau: This will do for now: https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/a417fac7bd6635d01e83dcc19bf576fd960db689
18:17pmoreau: Got a few extra tests passing with this. Going to look into why I have so many tests crashing in vtn_variable_create() at `var->var->constant_initializer =`, due to var->var == NULL.
18:17karolherbst: pmoreau: don't set inputOffset :D
18:17karolherbst: nv50 already does so
18:18karolherbst: ohh wait
18:18karolherbst: I missread the diff
18:18karolherbst: I'd set sharedOffset from within codegen though propably
18:19karolherbst: sharedOffset is more like a result of the compilation
18:19karolherbst: than anything else
18:19karolherbst: and we need to be careful due to alignment and stuff
18:19karolherbst: I think...
18:20pmoreau: I could set it only for nv50 in nv50_ir_from_nir
18:20pmoreau: I considered doing that
18:20karolherbst: currently wondering
18:20karolherbst: how do we want to use this value though?
18:20karolherbst: static in kernel offsets?
18:20karolherbst: ahh.. this sucks
18:22karolherbst: is there any benefit of having shorter offsets?
18:22karolherbst: like do we have a benefit of +0x4 over +0x244
18:22imirkin: afaik there's a max static offset size
18:23karolherbst: would be good to know how high that is
18:23imirkin: i forget, like 0x8000 i think
18:23imirkin: maybe i'm mixing up gens though
18:23imirkin: should look at the encoding
18:23karolherbst: we talk about shared mem here though
18:23karolherbst: but I guess having no offset gives us a benefit in some cases?
18:24karolherbst: pmoreau: from a feeling I would say passing the adjusted offset into the kernel gives us benefits over doing that at runtime with static offsets
18:25karolherbst: but I don't know why exactly yet :)
18:25pmoreau: I can also just do an iadd with the offset, to avoid running into issues with encoded offsets.
18:26karolherbst: pmoreau: why not just add the value in the kernel offset buffer?
18:26karolherbst: *kernel input buffer
18:26karolherbst: besides clover not supporting it yet :p
18:27pmoreau: It could be done
18:27pmoreau: Not sure whether r600 will complain about it or not.
18:28karolherbst: why would r600 care?
18:29karolherbst: it's more or less a nv50 specific problem
18:29karolherbst: just need to wire it up in gallium somehow
18:30karolherbst: anyway, would like to know what the limitations and benefits are
18:30pmoreau: Oh, you would only append it to the input buffer on the Nouveau-side.
18:30pmoreau: Yeah, then r600 should not care.
18:31karolherbst: well, we can't do it _inside_ nv50
18:31pmoreau: I was thinking you would be doing something similar to what r600 did, where they just appended stuff to the input buffer.
18:31karolherbst: it has to be done in clover, but we can do it through an interface where only nv50 cares about...
18:31karolherbst: but.. mhhh
18:31karolherbst: ahh.. no
18:31karolherbst: that's missdesigned :p
18:42pmoreau: i found the issue regarding var->var being NULL
18:43pmoreau: It happens for a variable with storage class UniformConstant that has an initialiser.
18:44pmoreau: vtn_storage_class_to_mode() ends up changing the mode to vtn_variable_mode_cross_workgroup (since options->constant_as_global is set), and no variable is allocated for that mode since “/* These don't need actual variables. */”.
18:45karolherbst: pmoreau: right, but that's already in the work of being fixed
18:46pmoreau: Ah okay, cool!
18:46karolherbst: the issue is just more annoying
18:47karolherbst: pmoreau: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/29 plus all discussions .p
18:47karolherbst: the issue is if you have two indirection levels
18:47karolherbst: and reference in kernel constants and constant buffers
18:47karolherbst: but obviously not all drivers want to put all indirectly accessed constants into an ubo
18:48karolherbst: annoying details
18:49pmoreau: I see; annoying indeed
18:54pmoreau: Mmh, are inlined offsets non-existent on Tesla for global memory?
18:55pmoreau: Let’s see what TGSI does
18:59imirkin: might not be
19:20pmoreau: We do not have a pass to split 64-bit adds on Tesla? That seems weird. Or maybe no 64-bit ops are exposed in GLSL on Tesla?
19:22pmoreau: I like when the emitter is like “sure, here you go: `EMIT: add s64 $r0d $r2d $r0d`”, and then envydis is like “nope: `add b32 $r0 $r2 $r0`”.
19:23karolherbst: pmoreau: huh.. weird :D
19:23karolherbst: but yeah, maybe we don't write it up
19:24karolherbst: PIPE_CAP_INT64: return 0
19:24karolherbst: so... yeah
19:26pmoreau: Welp, time to support those then.
19:26karolherbst: have fun
19:26karolherbst: pmoreau: btw, you can already submit MRs for more or less trivial patches so that you don't have to carry too much locally
19:27pmoreau: Yeah, I should
19:29pmoreau: I was thinking of submitting https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/bc055ea0bdeb695af1a77d8d5a3282b7526da4ca and https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/54f5d3b95bcb3c5ad8fc6da3c6410d3a6b04734d
19:29pmoreau: But I should also do for the nv50 patches.
19:29karolherbst: ohh, yeah, seems useful
19:33AndrewR: pmoreau, I think latest commit at clover_support_hmm_wip_with_nv50 started to s4egfault luxcoreui, but may be this is expected ...
19:33pmoreau: Where does it segfault?
19:36AndrewR: pmoreau, https://pastebin.com/TAyYkACF
19:36pmoreau: Thank you!
19:37pmoreau: Mmh, so it is segfaulting in the nir validator.
19:39pmoreau: I don’t see how my changes could make it crash since I only made changes that happen after that has run (except for one change, but it does not seem to be the culprit either).
19:40AndrewR: pmoreau, I'll try to recompile again (may be not as debugoptimized, but as debug?)
19:40pmoreau: So I am guessing that it progresses further than before, and now hits a bug in NIR.
19:41AndrewR: pmoreau, previous commit was showing black screen in luxcoreui, but it appeared to compute something ....
19:54AndrewR: pmoreau, previous version still work .. (as in, just window with menus and statistics, no actual image)
19:56pmoreau: I can’t see what would cause that, besides that before it was failing but fell back to some other path.
19:57pmoreau: But now that something is working better, it no longer is using that fallback path and is hitting a new bug.
19:57pmoreau: karolherbst: Do you see anything in my patch that could end up causing nir_validate to segfault?
19:58karolherbst: it is more likely that there are random bugs :p
19:58pmoreau: That could be too :-D
19:58pmoreau: The best kind of bugs!
20:06AndrewR: pmoreau, https://pastebin.pl/view/7ef0ff16 (pastebin said I reached my limit of 10 pastes for 24h!)
20:08pmoreau: Same reason as in your previous pastebin: somehow `validate_var_decl()` is calling `glsl_without_array()` with a NULL pointer.
20:09pmoreau: Could be good to investigate what is going on at frame #3, try to figure out where this NULL pointer is coming from.
20:27AndrewR: pmoreau, I added some printfs: https://pastebin.pl/view/f71cc87c
20:32AndrewR: but I probably will retract myself to bed ... good night, and thanks for your work!
20:36pmoreau: Good night! I should go to bed too, and have dinner :-D
20:37pmoreau: Why I am doing all this manual lowering of 64-bit operations, when I can change ask NIR to do it for me. ¯\_(ツ)_/¯
20:48pmoreau: How many registers can be accessed on Tesla? Hitting CP_NO_REG_SPACE_STRIPED when using only 17 regs seems surprising.
22:30karolherbst: pmoreau: uhhhh
22:31karolherbst: maybe some thread misconfiguration happens as well
22:31karolherbst: with compute all of that is a bit more explicit probably