IRC Logs of #nouveau on irc.freenode.net for 2025-08-05

00:24 misyltoad[d]: airlied[d]: thatd be cool, either that or destroying v4l2 so it never has to be touched by anyone again 🐸
00:25 misyltoad[d]: so many painful memories on qcom :<
00:29 airlied[d]: I do wonder with a lot of the arm decode hw whether any of it has queues or it all needs to be fed from the kernel
00:32 HdkR: TIL that wine supports v4l2 because someone opened a pull request for it.
00:32 HdkR: At least that is sticking in its webcam lane as expected.
00:35 misyltoad[d]: airlied[d]: i think its all stateful, even in iris :(
00:39 misyltoad[d]: i do wish there was a better api than v4l2 for stateful encode/decode... like literally anything would be better than what we have now :(((
00:41 misyltoad[d]: vk to v4l2 might solve a lot of my concerns but im doubtful on a decent amount of qbuf etc interactions
00:41 misyltoad[d]: curious as to your plans there
01:11 esdrastarsis[d]: phomes_[d]: I think this can be closed: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13606
01:14 phomes_[d]: thanks. I closed it
08:08 linkmauve: misyltoad[d], what do you mean by qbuf interactions? VIDIOC_QBUF is the only way to queue a buffer in V4L2 AFAIK.
08:10 linkmauve: airlied[d], the rk3588 for instance has two instances of its H.264/H.265/VP9 decoder, which must be combined together to reach 8K60 decoding, and it also has four instances of its JPEG encoder. Currently only one of each instances is exposed by the dts, because the kernel is in a better place than userland to do e.g. round-robin between different processes.
08:12 linkmauve: HdkR, for cameras, pipewire and libcamera seems like a better option than V4L2, which only supports simple cameras which most devices stopped using in the past decade.
08:13 linkmauve: Nowadays you have a very complex V4L2 graph described in a /dev/media* device, where you must build the pipeline manually from the sensor to the ISP to the encoder with a bunch of feedback for 3A and more.
10:49 Lynne: instances being combined together? frame-wise, or tile-wise?
11:07 avhe[d]: if it's anything like nvdec (probably is since i believe both rockchip and nv use hantro IP), it's two separate hardware engines that can operate in parallel (assuming no reference dependency of course)
11:08 avhe[d]: the nvidia userspace driver creates two separates queues and does the scheduling itself
11:30 ermine1716[d]: > pipewire and libcamera seems like a better option than V4L2
11:30 ermine1716[d]: So one should use pipewiresrc instead of v4l2src in gstreamer?
11:59 misyltoad[d]: linkmauve: i mean, you can have multiple dpb/bitstream buffer/etc per queue no? that doesnt mesh well with the stateful encoder model
18:01 jja2000[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1402350782476718140/IMG_20250805_200047_472.jpg?ex=68939870&is=689246f0&hm=2ffdf99a0c1870d5901964c3c5ba2dbb983a9e7fcab2898017229a2237c02c70&
18:01 jja2000[d]: gfxstrand[d]: cc that uboot size diff post
18:03 jja2000[d]: Last time I tested the TX2, the images shipped with uboot-images-armv8 did fit, but idk about the LNX partition size on it compared to the nano
18:03 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
18:04 karolherbst[d]: can't the size be changed?
18:04 gfxstrand[d]: I don't have a TX1
18:04 gfxstrand[d]: I've got TX1, Nano, and Xavier on my desk
18:04 gfxstrand[d]: And my TX1 is funky and the GPU doesn't get detected
18:04 gfxstrand[d]: I'm hoping for more luck with the Nano but so far it's been a mess
18:07 jja2000[d]: gfxstrand[d]: Upstream DTS doesn't have the GPU node set to enabled it seems
18:07 jja2000[d]: It does on the nano
18:07 gfxstrand[d]: Yeah. I was running with a patched DTS that fixes that
18:07 jja2000[d]: Alright, I remember that
18:07 gfxstrand[d]: marysaka[d]: 's TX1 works great. Mine doesn't. Same config.
18:07 karolherbst[d]: the partition table is somewhere in a script there
18:08 gfxstrand[d]: I'm trying with F41 now. That's what worked on my TX1
18:09 jja2000[d]: Good luck, I'm hopeful
18:09 jja2000[d]: Do keep a UART connection handy though, TX2 didn't have uart on display nor a happily functioning gpu in userspace :^)
18:11 gfxstrand[d]: Yeah, that's why I'm mucking about on my Nano instead of my Switch. I have a UART. 🙂
18:13 jja2000[d]: Yeah joycons are not so nicely expensive to solder to the rail for
18:18 steel01[d]: jja2000[d]: https://lore.kernel.org/all/20250420-tx1-gpu-v1-1-d500de18e43e@gmail.com/
18:18 steel01[d]: It should now. I fixed that.
18:18 steel01[d]: Maybe hasn't been long enough to filter to distros yet, though.
18:25 steel01[d]: https://gitlab.incom.co/CM-Shield/android_kernel_nvidia_kernel/-/commit/2fcc2dca9d252e08e81a0a3e2cf26b04da13ce18
18:25 steel01[d]: Question for those familiar with the kernel driver. I'm trying to add basic devfreq support for tegra in nouveau. I'm not familiar with the code this driver at all, though and I'm having trouble figuring part of it out, and the current state of the linked commit makes that obvious. Namely: where do tasks get submitted? The other drivers that plumb devfreq build ondemand stats from the respective
18:25 steel01[d]: task scheduler. nouveau_sched is apparently something completely different.
18:32 karolherbst[d]: the infra for most of it doesn't really exist in nouveau without GSP
18:32 karolherbst[d]: there is manual reclocking support, so users can choose the perf level, but nothing automatic
18:32 karolherbst[d]: the idea was to read the idle counters on the PMU and hook this all up
18:32 karolherbst[d]: but never managed to finish it
18:33 karolherbst[d]: it also never worked reliable enough, because reverse engineering reclocking is always kinda flaky given how flexible the hardware is there
18:36 karolherbst[d]: I have a bit of code to set up the counters and add PMU code to read them out/poll them: https://github.com/karolherbst/nouveau/commits/pmu_counters_v4/
18:36 steel01[d]: Reclocking for tegra is already there.
18:36 steel01[d]: For gk20a and gm20b, anyways.
18:38 steel01[d]: I'd like to get gp10b hooked up too. I'm not sure how different that's going to be compared to gm20b, though. Probably more than I can handle at my knowledge level.
18:50 jja2000[d]: steel01[d]: If you need me to try anything for gp10b, let me know
18:52 steel01[d]: I've got all the hardware. From t124 to t234. And t114, but that's a whole different can of worms for gpu. But if I do get something running, it would be good to have extra testing. Especially since my target is android, not desktop linux.
18:57 jja2000[d]: ahhh alright, for T114, the only person trying to get the GPU running went on to do his thesis and stopped working on it
18:58 jja2000[d]: He did go a little bit crazy trying to get it to work though, so maybe it's not that bad :^)
18:59 jja2000[d]: Also, about reclocking for GM20B, is that a recent thing? Last time I've see it was stuck at 30MHz by default
19:00 steel01[d]: 76MHz.
19:01 steel01[d]: There's a pstate node in debugfs.
19:01 steel01[d]: I'm trying to plumb devfreq to not have to use debugfs.
19:02 steel01[d]: jetson:/sys/kernel/debug/dri/57000000.gpu # cat pstate
19:02 steel01[d]: 01: core 76 MHz AC DC *
19:02 steel01[d]: 02: core 153 MHz
19:02 steel01[d]: 03: core 230 MHz
19:02 steel01[d]: 04: core 307 MHz
19:02 steel01[d]: 05: core 384 MHz
19:02 steel01[d]: 06: core 460 MHz
19:02 steel01[d]: 07: core 537 MHz
19:02 steel01[d]: 08: core 614 MHz
19:02 steel01[d]: 09: core 691 MHz
19:02 steel01[d]: 0a: core 768 MHz
19:02 steel01[d]: 0b: core 844 MHz
19:02 steel01[d]: 0c: core 921 MHz
19:02 steel01[d]: 0d: core 998 MHz
19:02 steel01[d]: AC: core 76 MHz
19:03 steel01[d]: Note that the top state is unstable. I've been setting 0a/0b as the soft cap to keep my unit from panicking.
19:06 steel01[d]: It's not recent, though. Gnurou (Alexandre Corbout) wrote it in the initial nouveau submission, back in like 2015. He added pstates to gk20a as well. But when he pushed gp10b support, there was no pstate support. I've got an email out to him asking if there was a blocker reason for that or just a time crunch. It normally takes 2-3 weeks for him to reply, though. So I've still got a while to wait.
19:07 jja2000[d]: Ah, got it
19:09 jja2000[d]: t210 also had a lack of cpufreq iirc, t124-cpufreq -ENOENTs on finding out it's t210
19:10 steel01[d]: ? That always worked fine for me.
19:10 jja2000[d]: Someone I was in contact with had ported the downstream 5.15 driver but disappaeared off the map
19:10 djdeath3483[d]: gfxstrand[d]: : would you have a minute to look into https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080 ?
19:10 steel01[d]: jja2000[d]: ```
19:10 steel01[d]: jetson:/sys/devices/system/cpu/cpufreq/policy0 # ls
19:10 steel01[d]: affected_cpus cpuinfo_max_freq cpuinfo_transition_latency scaling_available_frequencies scaling_cur_freq scaling_governor scaling_min_freq stats
19:10 steel01[d]: cpuinfo_cur_freq cpuinfo_min_freq related_cpus scaling_available_governors scaling_driver scaling_max_freq scaling_setspeed
19:11 gfxstrand[d]: djdeath3483[d]: What's wrong with it?
19:11 djdeath3483[d]: gfxstrand[d]: : this is causing pretty big regression (hangs) for us and reading english sentences of the memory model doesn't help understanding thing 🙂
19:12 gfxstrand[d]: karolherbst[d]: You might want to check that out. It might help with some of the benchmarks you've been looking at
19:12 djdeath3483[d]: gfxstrand[d]: : I think there is a misunderstanding maybe that atomicStore/atomicLoad are treated as 2 instructions in the SPIRV->NIR translation and that you can put the load/store on different sides of the barrier
19:12 jja2000[d]: steel01[d]: I might be misreading https://github.com/torvalds/linux/blob/master/drivers/cpufreq/tegra124-cpufreq.c#L208
19:13 karolherbst[d]: gfxstrand[d]: the barrier stuff?
19:13 djdeath3483[d]: yeah
19:13 djdeath3483[d]: it's hanging a fair bunch of BVH compute shaders on Anv
19:14 karolherbst[d]: doesn't change a thing really, the issue we are having are a bit more fundamental than that
19:14 steel01[d]: jja2000[d]: Bail if it isn't one of those three.
19:14 karolherbst[d]: ohh, mhh
19:14 djdeath3483[d]: karolherbst[d]: I think it effectively removes some barriers between store & load
19:14 jja2000[d]: steel01[d]: Then I misread it, the other dude had to port it for whatever reason. Oh well
19:15 karolherbst[d]: djdeath3483[d]: I hope you aren't running the barrier opt passes after lowering io
19:16 djdeath3483[d]: nir_opt_barrier_modes? I think we run it pretty early
19:16 karolherbst[d]: yeah
19:16 karolherbst[d]: but also the other one
19:16 karolherbst[d]: they only work on derefs, and without derefs it's removing too much
19:17 djdeath3483[d]: no we run it before lowering io
19:17 djdeath3483[d]: but the spirv change also removes stuff
19:17 djdeath3483[d]: which is my main issue
19:17 karolherbst[d]: but the perf issue I'm having with the coop-matrix shaders is, that they do barriers within loops and we unroll loops too late
19:17 karolherbst[d]: djdeath3483[d]: ahh.. pain
19:19 karolherbst[d]: gfxstrand[d]: I already have 7 MRs in the pipeline that improve performance 😄
19:20 djdeath3483[d]: karolherbst[d]: or rather moves things around a bit, a new of barriers before/after load/store are combined into a single one only before/after
19:23 karolherbst[d]: mhhh, the good thing (or at least good for me is) that I'm no expert on barrier semantics at all 🙃 so I can't really comment on the MR being correct or not
19:27 djdeath3483[d]: I'm trying to understand how it can work on our HW and I don't see it really 🙂
19:28 djdeath3483[d]: filed https://gitlab.freedesktop.org/mesa/mesa/-/issues/13662
19:30 karolherbst[d]: djdeath3483[d]: can always be a bug in the kernels
19:31 djdeath3483[d]: yeah, starting to wonder 🙂
19:31 djdeath3483[d]: if they inverted the makeAvailable/makeVisible
19:31 karolherbst[d]: or a translator bug 🙃
19:32 djdeath3483[d]: translation seems to match what is in the glsl code
19:32 karolherbst[d]: ahh, at least that
19:32 karolherbst[d]: I should wire up all those advanced atomic features in CL at some point, so I get more testing on that part....
19:35 airlied[d]: I think I did spot a coopmat bug with barrier ordering on stores from our spirv to nir code
19:36 airlied[d]: But this doesn't sound like that
19:45 gfxstrand[d]: djdeath3483[d]: I'm not seeing anything wrong with it off-hand.
19:45 gfxstrand[d]: I'm not totally sure about the 3rd patch, though.
19:52 nakamuraosaki_: That is not fibonacci sequence but powers of twos sequenced combinations as sum, it has nothing to with factorial or logarithm or DFT/FFT either. So if you want to decode you need to place sum of all combinations 1 to 255 onto the formula that previously delt with only 64 invariant sums from sparsely represented powers of twos bits of 64bit storeable where 1bit in 8bitfield notation was
19:52 nakamuraosaki_: 5 and 2bit 9 etc. In three bit invarianced decoding i used 37 41 and 57 randomly their sum is 135. so 74+98=172 is 2*37 + 135-37=98 where as 176 and 192 are the two other sums, now you replace similar higher entropy sums and you can get to a encoder with arithmetic already posted. If you want to add higher indexes and bases to distibguish things you'd have to shift the cell's storeable
19:52 nakamuraosaki_: representation with 4, so frist you encode the 32bit value to packed presentation and to store that as value 2 in the hash you'd want to represent the first bit now as 9. not as 5 for the first cell. so the last bit is higher alike, then also sums would be generated for 64 combinations, however sums are different, now they are totalling to higher value, so you never count the powers in,
19:52 nakamuraosaki_: you store 4 to every option in the queue jumper, then 8 then 12. and add , hence the contiguous corresponing sequence 5 is 9, so the sum can not be approximated. But to store to index 2 like 120 instead of 118 and base 257 instead of 256, so do new base calculations, so bank 14 is is 5+14*4, so the calculation , that is iterative or recursive in loop procudure , it technically counts
19:52 nakamuraosaki_: bits under the hood as side effect bit is meant to get offset for IO and intermediate values through the answer set with inst pc id 14 or data id cell 14, which has different routines to run. So it's that 14*4 is the only multiply needed. but if you restore/read with order of stored order which is just contiguous order , you can cache the state of the last index and add that to iterative
19:52 nakamuraosaki_: procedure as starts bound. And hence have no multiply needed, it's a same tailcall or unrolled loop that does it. Hence i said, that even multiply isn't needed, it's cause planet earth is rich to provide electrons in some elements Cu+ that carry this out in the circuit of VLSI. So electrical rule is always that chipset has a closed circuit.
19:55 chikuwad[d]: ~~spent~~ wasted an hour figuring out why kate was replacing tabs with spaces in my code when saving and fucking up the git diff
19:55 chikuwad[d]: guess the culprit
19:56 chikuwad[d]: spoiler warning: it was clang-format
20:17 djdeath3483[d]: gfxstrand[d]: : it doesn´t bother you that now you can have barrier(); store(); load(); barrier(); ?
20:18 djdeath3483[d]: gfxstrand[d]: whereas previously it was barrier(); store(); barrier(); barrier(); load(); barrier();
20:18 djdeath3483[d]: the 2 in the middle would likely get optimized
20:18 djdeath3483[d]: gfxstrand[d]: : that would be as a result of atomicStore(); atomicLoad();
20:20 gfxstrand[d]: Yeah, I think the lots of barriers case was atomic load/store
20:20 gfxstrand[d]: But also I have a feeling we did it backwards for a reason. If only I could remember what that reason was...
20:38 djdeath3483[d]: yeah, something is odd... I just can't think of a way we can make 2 invocations of the same workgroup see each other's atomicStore() in the following atomicLoad() in the current translation
20:38 djdeath3483[d]: some L1 cache has to be flushed
20:39 djdeath3483[d]: and so if there is no barrier it cannot happen
20:49 gfxstrand[d]: Yeah, Intel needs a barrier there
20:49 gfxstrand[d]: And if the client is setting ACQ_REL, on both, you should get one
20:53 djdeath3483[d]: it's only atomicStore(REL) atomicload(ACQ)
20:54 gfxstrand[d]: Yeah, you should get a barrier there
20:58 djdeath3483[d]: I'll propose a revert in the morning with some explanation why this doesn't work for us
20:58 gfxstrand[d]: It could be that brw is ignoring `COHERENT` and ACO isn't
20:59 gfxstrand[d]: All the brw stuff was designed assuming we turn op decorations into barriers. I think passes have been added/modified which don't fit that assumption
21:00 djdeath3483[d]: it is ignorant about COHERENT
21:00 djdeath3483[d]: I was meant to fix this
21:01 djdeath3483[d]: will try tomorrow
21:01 gfxstrand[d]: The current state of NIR re these things isn't great
21:01 gfxstrand[d]: We don't want to do barriers everywhere because most hardware can do better. But it was all designed assuming Intel which just can't (or at least couldn't at the time).
21:01 gfxstrand[d]: Except people have come along and said "We can do better!" and I'm not convinced it's been done holistically
21:02 gfxstrand[d]: Which probably means I need to give it all a very good long look
21:03 snowycoder[d]: Huh, 431a9508a0242fec7154121c1a1d06c83c2c4463 breaks most additions pre-volta (it adds iadd3 but the GPU doesn't support it).
21:03 snowycoder[d]: I'm fixing it.
21:03 snowycoder[d]: Just a question, can I add memes in Merge Requests?
21:04 djdeath3483[d]: gfxstrand[d]: I think pre DG2 cannot
21:04 djdeath3483[d]: it's all funky with sparse residency there for the same reason
21:08 gfxstrand[d]: snowycoder[d]: Yes but don't go overboard. A meme isn't a commit message or a merge request description. It can be fun but you need real text in there, too.
21:11 gfxstrand[d]: snowycoder[d]: Yeah, that needed to check for sm >= 70
21:11 gfxstrand[d]: or has_iadd3
21:11 gfxstrand[d]: has_iadd3 would be better
21:12 snowycoder[d]: I choose that exact name :3
21:18 snowycoder[d]: Simple fix opened: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36588
22:35 mangodev[d]: snowycoder[d]: so weird seeing this in the nouveau chat 🥴
22:37 snowycoder[d]: mangodev[d]: Sorry, I use it everywhere😅
22:38 mangodev[d]: snowycoder[d]: i refrain from :3'ing in this chat because i see it as people's workplace
22:38 mangodev[d]: …but now that i think about it, we all type in lowercase, put memes in MRs, and a good few quarters of the active nouveau team are trans
22:38 mangodev[d]: so this definitely isn't a disgruntled microsoft department that has worked on the same product since the 80s (it could even be said we're a little youthful)
22:43 mangodev[d]: idk what to think on it because, to be honest, i have not been in a workplace before (😰)
22:43 mangodev[d]: and i do not know the dynamic of this workplace
22:45 mangodev[d]: and i guess there isn't really a strict structure to here
22:45 mangodev[d]: it's not like karol is gonna fire me for saying :3 in an MR comment (…right?)
22:48 snowycoder[d]: In my "workplace" (University researcher), things are really lax, to the point that commit messages don't say anything (things like "Commit") or are in Italian.
22:48 snowycoder[d]: So I don't know either, but I think git history and code is what counts and using memes or emojis in the chat just helps with the human factor.
22:48 snowycoder[d]: But, well, I'm the newcomer here, if I'm doing something wrong please tell me.
22:50 gfxstrand[d]: mangodev[d]: Pretty much. Keep stuff SFW, try not to say stuff that would make someone else personally uncomfortable, but otherwise I wouldn't sweat it. I try to keep things light and good natured.
22:54 gfxstrand[d]: snowycoder[d]: This is a pretty good example. The commit message is descriptive and informative so nothing gets lost but he's still having a little fun with it.
22:57 gfxstrand[d]: But if I bisect something to a commit with a one-line commit message that's part of an MR that just says "THE BEST REFACTOR EVAR!", that's when I'm likely to get pissed.
22:58 zmike[d]: I feel attacked
23:03 airlied[d]: seems justified 🙂
23:36 mhenning[d]: zmike[d]: she said she'd only be annoyed if your commit shows up in a bisect, so you're fine as long as you never write any bugs 😛
23:40 zmike[d]: I only write bugs
23:52 x512[m]: zmike[d]: I am waiting for continue of "THE BEST REFACTOR EVAR!" patch series. There are no reason to keep DRI anymore.