16:57 bamse: given the second check in nvkm_pmu_fan_controlled(), should i expect my gm204 controlling the fan automatically? or am i missing something?
16:57 imirkin: bamse: you should.
16:57 imirkin: GM20x requires signed firmware to control, among other things, fan speed
16:58 imirkin: nvidia supplies some PMU firmware in linux-firmware which does it automatically without allowing the (OS) driver to do anything about it
16:58 imirkin: bamse: that said, if it's a laptop, cooling is often controlled by the EC, and there is no nvidia-attached fan control
16:59 bamse: imirkin: it's my desktop...and it's not doing any fan control
16:59 bamse: just sits there at a nice 90-95 degrees
16:59 imirkin: thank you for choosing nvidia. we appreciate you have a choice of gpu vendors, and you appear to have chosen the wrong one.
17:00 bamse: but am i missing said firmware? or does it not exist?
17:00 imirkin: it's in linux-firmware
17:00 bamse: so it should be loaded automagically?
17:00 imirkin: failure to load it would result in no acceleration
17:00 imirkin: (and also some errors in dmesg)
17:00 bamse: and a firmware loading error message in dmesg?
17:01 imirkin: just run 'glxinfo' and see if you get nouveau reported as the driver
17:01 imirkin: if you do, that means you have accel and thus firmware
17:01 bamse: yeah, both dmesg and glxinfo indicates that it's working as expected
17:02 imirkin: (just make sure glxinfo doesn't say llvmpipe...)
17:02 bamse: nah, my vendor is "nouveau"
17:02 imirkin: i have no idea what policy the rtos in the pmu is following
17:02 imirkin: presumably something slightly vbios-driven? dunno
17:05 bamse: imirkin: thanks for confirming my expectations :)
17:06 imirkin: yeah, sorry i don't have better news for you. in the future, consider this experience when allocating funds towards a new gpu
17:07 imirkin: some people have gone as far as to tear off the "real" fans and installing directly controlled ones
17:07 imirkin: (very few)
17:09 imirkin: bamse: some day i'll go back to the apq8084 pcie thing :) i just got really confused, and then some other stuff happened, and now it's like a year later.
17:12 bamse: imirkin: well, perhaps we can get DP working on my 8cx laptop and use that to drive my monitor instead...
17:12 bamse: imirkin: and i'll be ready to review those pcie patches when you find the time :)
17:12 imirkin: lol
17:40 pmoreau: Do we already have somewhere hardware limits like the maximum number of registers that can be used by a block?
17:41 pmoreau: I have a commit (https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/9c0c2829f8d937343e64d1e2282f23324e0fc3e9) where I avoid dispatching a kernel if we know it will go over some hardware limits, but I’m wondering if there is a better way than hardcoding the limits in Mesa.
18:09 imirkin: pmoreau: we figure out the max number of regs in the compiler
18:09 imirkin: pmoreau: probably incorrectly for nv50 :)
18:46 karolherbst: imirkin: CL is stupid
18:46 karolherbst: pmoreau: yeah, we have to fix clover first
18:47 karolherbst: soo.. what we have to change in the gallium API is, that the compilation has to report back how many threads it can launch at most or something
18:47 imirkin: karolherbst: well, basically max number of regs is sizeof(shared mem) / local group size
18:47 karolherbst: we don't know the local group size when compiling :)
18:48 karolherbst: for GL yes, for CL no
18:48 imirkin: i mean ... you do and you don't
18:48 imirkin: there's a group size variable ext
18:48 karolherbst: atm we do as we only compile to the hw ISA once we enqueue the kernel
18:48 imirkin: which enables you to have API-set local group size
18:48 karolherbst: imirkin: in CL we don't know until the kernel gets launched
18:48 imirkin: and it's allowed to report a lower limit
18:49 imirkin: than the "regular" limit
18:49 karolherbst: sure, for GL it's all well defined
18:49 imirkin: karolherbst: look at the variable one
18:49 karolherbst: it's still better than for CL
18:49 imirkin: GL_ARB_compute_variable_group_size
18:50 imirkin: the group size is part of the (new) DispatchComputeGroupSizeARB function
18:50 karolherbst: imirkin: can you query the actual upper limit for a compiled shader?
18:50 imirkin: you can query the upper limit for the "variable" shaders
18:50 imirkin: most hw will report the most conservative number of regs
18:50 karolherbst: huh
18:50 imirkin: based on the max allowed group size
18:50 karolherbst: so we don't even support that correctly in mesa
18:51 karolherbst: as I am sure we have no gallium API to report it based on a compiled shader
18:51 imirkin: er
18:51 imirkin: let me rephrase
18:51 imirkin: what i said is misleading
18:51 imirkin: 1. a shader has to indicate it has "variable" group size
18:51 imirkin: 2. when compiling, we assume that the "max" group size will be used, although there are other strategies
18:52 imirkin: 3. there is a *separate* limit for the max variable group size, which may be lower than the max fixed group size
18:53 karolherbst: sure, but I still don't think we have the actual gallium API to report better values
18:55 karolherbst: or maybe I missed something, but when I looked into it for CL I think we didn't
18:57 karolherbst: uhh, GL also has a crappy API
18:57 karolherbst: so for gallium we have PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
18:58 karolherbst: and this is a driver constant
18:58 karolherbst: but CL is.... stupid
18:58 imirkin: it's a limit on what can be passed into the DispatchComputeGroupSizeARB function
18:58 karolherbst: so, what you have to do for CL is: compile the kernel, calculate how big the blocks can be
18:58 karolherbst: and the application can query it per shader
18:58 karolherbst: *kernel
18:58 imirkin: ah ok
18:58 imirkin: well you'll need new api's for that
18:58 karolherbst: exactly
18:59 karolherbst: CL by default also allows you to specify the block and group size
18:59 karolherbst: group size without a limit :)
18:59 karolherbst: whatever fits in usize_t
18:59 karolherbst: but that's a different issue
19:00 karolherbst: and CL requires 3 dimensions on the group :)
19:00 karolherbst: same problem though
19:01 karolherbst: pmoreau: so yeah.. we need to change the gallium interfaces a little
19:01 karolherbst: applications can query the limits per kernel
19:01 pmoreau: “basically max number of regs is sizeof(shared mem) / local group size” why is this dependent on shared memory? The register file should be separate.
19:01 karolherbst: and clover should already block launching those
19:01 karolherbst: problem is... we compile too late
19:01 karolherbst: we essentially have to be able to compile without a context :/
19:02 karolherbst: it's a huge mess
19:02 pmoreau: True, and robclark had a series to add those per-kernel limits
19:02 karolherbst: yeah
19:02 pmoreau: And in which case, it would make sense to have clover block those.
19:03 karolherbst: yep
19:03 karolherbst: hence r600 underreports
19:03 karolherbst: assumes the worst case
19:03 pmoreau: It would simplify things since the launch function has no way to report back errors to the caller AFAICT.
19:03 karolherbst: ahh yeah
19:03 imirkin: pmoreau: the regs gotta live somewhere
19:03 karolherbst: clover also calculates the automatic local size based on reported values by the driver
19:03 karolherbst: globally
19:03 karolherbst: not per kernel
19:04 karolherbst: it needs some work all over the place
19:04 imirkin: pmoreau: i mean the total shared mem size. and subtract out any actually-used shared mem since that can't go towards regs.
19:04 karolherbst: so on nv50 regs == smem?
19:04 imirkin: nvidia
19:04 karolherbst: uhhhh
19:04 karolherbst: I think this explains why CL is designed this way
19:04 pmoreau: Really?
19:05 pmoreau: Shared mem and L1 cache, yes
19:05 pmoreau: But shared mem and register file?
19:05 karolherbst: well..
19:05 imirkin: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp#n265
19:05 karolherbst: compute + graphics wasn't a thing
19:05 karolherbst: so I can understand why they just reused the same thing
19:05 imirkin: look at the register count calculation for compute
19:05 imirkin: threads is set "elsewhere", but it's the local group size
19:06 karolherbst: imirkin: we don't know the threads when compiling :p
19:06 imirkin: i know
19:06 imirkin: so it gets set to the max allowed for "variable" groups in that case
19:06 imirkin: i just don't remember where we do that
19:06 karolherbst: but yeah
19:06 karolherbst: I am more or less familiar with the idea
19:07 karolherbst: but for CL we can just assume max allowed as we report back whatever we come up with
19:07 imirkin: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h#n177
19:07 imirkin: yeah, i mean the math can go the other way
19:07 imirkin: smsize / gpr count = max threads :)
19:07 karolherbst: I think we have to fix it for some chipsets actually...
19:08 imirkin: it's actually fairly accurate.
19:08 karolherbst: SM37 has 128k regs
19:08 karolherbst: SM53 can only use 32k per block
19:08 imirkin: ok
19:08 karolherbst: SM62 as well
19:08 imirkin: well, feel free to update.
19:08 karolherbst: yeah...
19:08 imirkin: that's TX1/etc right?
19:08 karolherbst: those are the chipsts which don't matter
19:08 karolherbst: yes
19:09 imirkin: SM37 is the hypothetical GK210?
19:09 karolherbst: yes
19:09 karolherbst: Tesla K80
19:09 imirkin: yea. i'll believe it when i see it ;)
19:09 karolherbst: hence me caring very little
19:09 pmoreau: I’m relatively sure that I could use all the advertised shared memory (48KB) + and still run kernels with >32 regs per thread without any spilling, on the Kepler Titan.
19:09 karolherbst: pmoreau: SM32?
19:10 karolherbst: ahh now
19:10 karolherbst: SM35
19:10 karolherbst: yeah/..
19:10 pmoreau: SM35 sounds more likely, but it was a long time ago
19:10 karolherbst: imirkin: we can also adjust the max smem ammount...
19:11 karolherbst: this is crazy volatile
19:11 imirkin: yeah.
19:11 imirkin: i tried to avoid all that.
19:11 imirkin: anyways, feel free to update
19:11 karolherbst: yeah.. doesn't really matter all that much for GL
19:11 karolherbst: but I was hitting some smem size issues on my tu116
19:11 karolherbst: *tu117
19:12 karolherbst: so I was inclined to look into it
19:12 imirkin: ok, go for it
19:12 karolherbst: at least nvidia publishes this information more or less correctly :)
19:14 imirkin: pmoreau: btw, are you planning to look at some of those patches?
19:14 pmoreau: I am
19:14 imirkin: i ended up including all the nv50_compute.c changes as well
19:15 imirkin: coz why not
19:15 imirkin: worst thing that happens is that we have to update it
19:15 imirkin: there are some upcoming changes that are needed for it, but they'll be largely incremental
19:15 pmoreau: Yeah, no worries
19:16 imirkin: (like adding the image surface info into uniforms somewhere, stuff like that)
19:28 pmoreau: imirkin: Would you like to open a MR/send the patches on the ML, or would you like me to comment directly on the commits?
19:29 imirkin: pmoreau: sigh ... the gitlab review thing is a disaster, unfortunately
19:29 imirkin: if you are going to have lots of feedback, i'll mail the patches and we can do a proper review
19:29 imirkin: if you have just a few small bits, we can use gitlab. i did open a MR for it already
19:31 imirkin: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9299
19:31 imirkin: also let me know if you think i screwed up the authorship on anything, happy to fix it up
19:32 imirkin: with commit splitting/etc it can be easy to screw it up
19:36 pmoreau: Duh 🤦 I was already on the merge request page, but on the commits view and therefore thought I was on a branch page…
22:36 imirkin: pmoreau: ideally i'll be able to iron out images and add shared atomic lowering over the weekend. unfortunately i can't easily test shared atomics since those are nva0+ only (and i have a g84 plugged in atm)
22:36 imirkin: i'm not even sure i have a g21x except for the gt215 with gddr5 which might have additional issues, so not good for feature enablement
23:32 damo22: imirkin: i have nva0 plugged in, if there is a way for me to help testing i can try to do that
23:33 imirkin: damo22: i'll let you knwo when i have patches
23:33 damo22: is there a test suite?
23:34 imirkin: yea, deqp
23:34 imirkin: i use the aosp one -- https://android.googlesource.com/platform/external/deqp/
23:34 imirkin: it might be part of VK-GL-CTS too now? not sure
23:34 damo22: interesting, so i would need to build mesa?
23:35 imirkin: to apply patches to mesa? yes.
23:35 imirkin: and also the test suite
23:35 damo22: is there a guide on how to run the compiled mesa with the test suite?
23:36 damo22: is it as simple as using a LD_LIBRARY_PATH ?
23:37 imirkin: pretty much
23:37 imirkin: i can supply some instructions closer to the fact
23:37 damo22: ok i will be in and out this weekend but if you ping me i will be around
23:37 imirkin: sounds good
23:37 imirkin: also no promises on this actually happening
23:38 imirkin: i'm going to be focusing on fixing things i can test myself first.
23:38 damo22: no probs
23:38 imirkin: so depends how long that takes
23:38 imirkin: (like adding at least array images, but ideally also 3d ... just need to wrap my head around tiling)
23:39 imirkin: need to mess with the tiling settings, and the y coord, i think
23:39 imirkin: and no one's the wiser
23:39 imirkin: maybe.
23:39 damo22: good luck
23:40 imirkin: thanks :)