16:57bamse: given the second check in nvkm_pmu_fan_controlled(), should i expect my gm204 controlling the fan automatically? or am i missing something?
16:57imirkin: bamse: you should.
16:57imirkin: GM20x requires signed firmware to control, among other things, fan speed
16:58imirkin: nvidia supplies some PMU firmware in linux-firmware which does it automatically without allowing the (OS) driver to do anything about it
16:58imirkin: bamse: that said, if it's a laptop, cooling is often controlled by the EC, and there is no nvidia-attached fan control
16:59bamse: imirkin: it's my desktop...and it's not doing any fan control
16:59bamse: just sits there at a nice 90-95 degrees
16:59imirkin: thank you for choosing nvidia. we appreciate you have a choice of gpu vendors, and you appear to have chosen the wrong one.
17:00bamse: but am i missing said firmware? or does it not exist?
17:00imirkin: it's in linux-firmware
17:00bamse: so it should be loaded automagically?
17:00imirkin: failure to load it would result in no acceleration
17:00imirkin: (and also some errors in dmesg)
17:00bamse: and a firmware loading error message in dmesg?
17:01imirkin: just run 'glxinfo' and see if you get nouveau reported as the driver
17:01imirkin: if you do, that means you have accel and thus firmware
17:01bamse: yeah, both dmesg and glxinfo indicates that it's working as expected
17:02imirkin: (just make sure glxinfo doesn't say llvmpipe...)
17:02bamse: nah, my vendor is "nouveau"
17:02imirkin: i have no idea what policy the rtos in the pmu is following
17:02imirkin: presumably something slightly vbios-driven? dunno
17:05bamse: imirkin: thanks for confirming my expectations :)
17:06imirkin: yeah, sorry i don't have better news for you. in the future, consider this experience when allocating funds towards a new gpu
17:07imirkin: some people have gone as far as to tear off the "real" fans and installing directly controlled ones
17:07imirkin: (very few)
17:09imirkin: bamse: some day i'll go back to the apq8084 pcie thing :) i just got really confused, and then some other stuff happened, and now it's like a year later.
17:12bamse: imirkin: well, perhaps we can get DP working on my 8cx laptop and use that to drive my monitor instead...
17:12bamse: imirkin: and i'll be ready to review those pcie patches when you find the time :)
17:40pmoreau: Do we already have somewhere hardware limits like the maximum number of registers that can be used by a block?
17:41pmoreau: I have a commit (https://gitlab.freedesktop.org/pmoreau/mesa/-/commit/9c0c2829f8d937343e64d1e2282f23324e0fc3e9) where I avoid dispatching a kernel if we know it will go over some hardware limits, but I’m wondering if there is a better way than hardcoding the limits in Mesa.
18:09imirkin: pmoreau: we figure out the max number of regs in the compiler
18:09imirkin: pmoreau: probably incorrectly for nv50 :)
18:46karolherbst: imirkin: CL is stupid
18:46karolherbst: pmoreau: yeah, we have to fix clover first
18:47karolherbst: soo.. what we have to change in the gallium API is, that the compilation has to report back how many threads it can launch at most or something
18:47imirkin: karolherbst: well, basically max number of regs is sizeof(shared mem) / local group size
18:47karolherbst: we don't know the local group size when compiling :)
18:48karolherbst: for GL yes, for CL no
18:48imirkin: i mean ... you do and you don't
18:48imirkin: there's a group size variable ext
18:48karolherbst: atm we do as we only compile to the hw ISA once we enqueue the kernel
18:48imirkin: which enables you to have API-set local group size
18:48karolherbst: imirkin: in CL we don't know until the kernel gets launched
18:48imirkin: and it's allowed to report a lower limit
18:49imirkin: than the "regular" limit
18:49karolherbst: sure, for GL it's all well defined
18:49imirkin: karolherbst: look at the variable one
18:49karolherbst: it's still better than for CL
18:50imirkin: the group size is part of the (new) DispatchComputeGroupSizeARB function
18:50karolherbst: imirkin: can you query the actual upper limit for a compiled shader?
18:50imirkin: you can query the upper limit for the "variable" shaders
18:50imirkin: most hw will report the most conservative number of regs
18:50imirkin: based on the max allowed group size
18:50karolherbst: so we don't even support that correctly in mesa
18:51karolherbst: as I am sure we have no gallium API to report it based on a compiled shader
18:51imirkin: let me rephrase
18:51imirkin: what i said is misleading
18:51imirkin: 1. a shader has to indicate it has "variable" group size
18:51imirkin: 2. when compiling, we assume that the "max" group size will be used, although there are other strategies
18:52imirkin: 3. there is a *separate* limit for the max variable group size, which may be lower than the max fixed group size
18:53karolherbst: sure, but I still don't think we have the actual gallium API to report better values
18:55karolherbst: or maybe I missed something, but when I looked into it for CL I think we didn't
18:57karolherbst: uhh, GL also has a crappy API
18:57karolherbst: so for gallium we have PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
18:58karolherbst: and this is a driver constant
18:58karolherbst: but CL is.... stupid
18:58imirkin: it's a limit on what can be passed into the DispatchComputeGroupSizeARB function
18:58karolherbst: so, what you have to do for CL is: compile the kernel, calculate how big the blocks can be
18:58karolherbst: and the application can query it per shader
18:58imirkin: ah ok
18:58imirkin: well you'll need new api's for that
18:59karolherbst: CL by default also allows you to specify the block and group size
18:59karolherbst: group size without a limit :)
18:59karolherbst: whatever fits in usize_t
18:59karolherbst: but that's a different issue
19:00karolherbst: and CL requires 3 dimensions on the group :)
19:00karolherbst: same problem though
19:01karolherbst: pmoreau: so yeah.. we need to change the gallium interfaces a little
19:01karolherbst: applications can query the limits per kernel
19:01pmoreau: “basically max number of regs is sizeof(shared mem) / local group size” why is this dependent on shared memory? The register file should be separate.
19:01karolherbst: and clover should already block launching those
19:01karolherbst: problem is... we compile too late
19:01karolherbst: we essentially have to be able to compile without a context :/
19:02karolherbst: it's a huge mess
19:02pmoreau: True, and robclark had a series to add those per-kernel limits
19:02pmoreau: And in which case, it would make sense to have clover block those.
19:03karolherbst: hence r600 underreports
19:03karolherbst: assumes the worst case
19:03pmoreau: It would simplify things since the launch function has no way to report back errors to the caller AFAICT.
19:03karolherbst: ahh yeah
19:03imirkin: pmoreau: the regs gotta live somewhere
19:03karolherbst: clover also calculates the automatic local size based on reported values by the driver
19:03karolherbst: not per kernel
19:04karolherbst: it needs some work all over the place
19:04imirkin: pmoreau: i mean the total shared mem size. and subtract out any actually-used shared mem since that can't go towards regs.
19:04karolherbst: so on nv50 regs == smem?
19:04karolherbst: I think this explains why CL is designed this way
19:05pmoreau: Shared mem and L1 cache, yes
19:05pmoreau: But shared mem and register file?
19:05imirkin: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp#n265
19:05karolherbst: compute + graphics wasn't a thing
19:05karolherbst: so I can understand why they just reused the same thing
19:05imirkin: look at the register count calculation for compute
19:05imirkin: threads is set "elsewhere", but it's the local group size
19:06karolherbst: imirkin: we don't know the threads when compiling :p
19:06imirkin: i know
19:06imirkin: so it gets set to the max allowed for "variable" groups in that case
19:06imirkin: i just don't remember where we do that
19:06karolherbst: but yeah
19:06karolherbst: I am more or less familiar with the idea
19:07karolherbst: but for CL we can just assume max allowed as we report back whatever we come up with
19:07imirkin: karolherbst: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h#n177
19:07imirkin: yeah, i mean the math can go the other way
19:07imirkin: smsize / gpr count = max threads :)
19:07karolherbst: I think we have to fix it for some chipsets actually...
19:08imirkin: it's actually fairly accurate.
19:08karolherbst: SM37 has 128k regs
19:08karolherbst: SM53 can only use 32k per block
19:08karolherbst: SM62 as well
19:08imirkin: well, feel free to update.
19:08imirkin: that's TX1/etc right?
19:08karolherbst: those are the chipsts which don't matter
19:09imirkin: SM37 is the hypothetical GK210?
19:09karolherbst: Tesla K80
19:09imirkin: yea. i'll believe it when i see it ;)
19:09karolherbst: hence me caring very little
19:09pmoreau: I’m relatively sure that I could use all the advertised shared memory (48KB) + and still run kernels with >32 regs per thread without any spilling, on the Kepler Titan.
19:09karolherbst: pmoreau: SM32?
19:10karolherbst: ahh now
19:10pmoreau: SM35 sounds more likely, but it was a long time ago
19:10karolherbst: imirkin: we can also adjust the max smem ammount...
19:11karolherbst: this is crazy volatile
19:11imirkin: i tried to avoid all that.
19:11imirkin: anyways, feel free to update
19:11karolherbst: yeah.. doesn't really matter all that much for GL
19:11karolherbst: but I was hitting some smem size issues on my tu116
19:12karolherbst: so I was inclined to look into it
19:12imirkin: ok, go for it
19:12karolherbst: at least nvidia publishes this information more or less correctly :)
19:14imirkin: pmoreau: btw, are you planning to look at some of those patches?
19:14pmoreau: I am
19:14imirkin: i ended up including all the nv50_compute.c changes as well
19:15imirkin: coz why not
19:15imirkin: worst thing that happens is that we have to update it
19:15imirkin: there are some upcoming changes that are needed for it, but they'll be largely incremental
19:15pmoreau: Yeah, no worries
19:16imirkin: (like adding the image surface info into uniforms somewhere, stuff like that)
19:28pmoreau: imirkin: Would you like to open a MR/send the patches on the ML, or would you like me to comment directly on the commits?
19:29imirkin: pmoreau: sigh ... the gitlab review thing is a disaster, unfortunately
19:29imirkin: if you are going to have lots of feedback, i'll mail the patches and we can do a proper review
19:29imirkin: if you have just a few small bits, we can use gitlab. i did open a MR for it already
19:31imirkin: pmoreau: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9299
19:31imirkin: also let me know if you think i screwed up the authorship on anything, happy to fix it up
19:32imirkin: with commit splitting/etc it can be easy to screw it up
19:36pmoreau: Duh 🤦 I was already on the merge request page, but on the commits view and therefore thought I was on a branch page…
22:36imirkin: pmoreau: ideally i'll be able to iron out images and add shared atomic lowering over the weekend. unfortunately i can't easily test shared atomics since those are nva0+ only (and i have a g84 plugged in atm)
22:36imirkin: i'm not even sure i have a g21x except for the gt215 with gddr5 which might have additional issues, so not good for feature enablement
23:32damo22: imirkin: i have nva0 plugged in, if there is a way for me to help testing i can try to do that
23:33imirkin: damo22: i'll let you knwo when i have patches
23:33damo22: is there a test suite?
23:34imirkin: yea, deqp
23:34imirkin: i use the aosp one -- https://android.googlesource.com/platform/external/deqp/
23:34imirkin: it might be part of VK-GL-CTS too now? not sure
23:34damo22: interesting, so i would need to build mesa?
23:35imirkin: to apply patches to mesa? yes.
23:35imirkin: and also the test suite
23:35damo22: is there a guide on how to run the compiled mesa with the test suite?
23:36damo22: is it as simple as using a LD_LIBRARY_PATH ?
23:37imirkin: pretty much
23:37imirkin: i can supply some instructions closer to the fact
23:37damo22: ok i will be in and out this weekend but if you ping me i will be around
23:37imirkin: sounds good
23:37imirkin: also no promises on this actually happening
23:38imirkin: i'm going to be focusing on fixing things i can test myself first.
23:38damo22: no probs
23:38imirkin: so depends how long that takes
23:38imirkin: (like adding at least array images, but ideally also 3d ... just need to wrap my head around tiling)
23:39imirkin: need to mess with the tiling settings, and the y coord, i think
23:39imirkin: and no one's the wiser
23:39damo22: good luck
23:40imirkin: thanks :)