00:05dwlsalmeida[d]: @averne man, looking at the h265 V4L2 hantro driver now, I am now sure that its the same IP being used by NVIDIA GPUs
00:08dwlsalmeida[d]: apparently they've already figured out the size of the SAO, FILTER and other buffers some 4 years ago:
00:08dwlsalmeida[d]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/media/platform/verisilicon/hantro_hevc.c#L120
00:33gfxstrand[d]: dwlsalmeida[d]: https://gitlab.freedesktop.org/gfxstrand/mesa/-/commits/for/dwlsalmeida?ref_type=heads
00:34gfxstrand[d]: IDK that that will make any perf difference but its a lot simpler.
00:35dwlsalmeida[d]: Thanks! I’ll push that to the MR
00:37gfxstrand[d]: Oh, don't. It doesn't build yet because I changed the name of the function without changing the callers
00:37gfxstrand[d]: I haven't even build tested
00:37gfxstrand[d]: But do give it a look
00:41gfxstrand[d]: Oh, and it now returns void
00:52gfxstrand[d]: But yeah that gets rid of at least one memcpy in your path and makes Rust and C interleave properly.
05:10gfxstrand[d]: Actually, it might not get rid of a memcpy but it's still better.
05:24gfxstrand[d]: airlied[d]: what's the status of nouveau on Grace/Hopper?
05:25airlied[d]: needs a fair bit of work, the boot sequence is a bit different to geforce
05:26airlied[d]: but it might haved converged a bit with blackwell
09:06avhe[d]: dwlsalmeida[d]: Interesting, I didn't notice this when I was working on it. It matches down to the "HW guys wanted to have this" 😂
09:06avhe[d]: <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/nvdec_drv.h#L182>
09:06avhe[d]: <https://elixir.bootlin.com/linux/v6.12.6/source/drivers/media/platform/verisilicon/hantro_hevc.c#L262>
13:52gfxstrand[d]: airlied[d]: But it's still GSP? Not Tegra weirdness.
13:56gfxstrand[d]: Trying to get a T-shirt size on what it'll take to bring up one of those neat little DIGITS boxes.
14:09Jasper[m]: @_oftc_gfxstrand[d]:matrix.org is it still different on the jetson models that do have GPUs whose desktop equivalent has GSP?
14:10Jasper[m]: Or, do Turing/Ampere Tegra's lack the GSP mechanism/way of working that the desktop/laptop gpu's do have
14:10mohamexiety[d]: gfxstrand[d]: that would be so cool and awesome honestly. those boxes looked really neat
14:18HdkR: I'm hoping those type-c ports are Thunderbolt, not holding my breath since Thunderbolt + ARM hasn't been very successful under Linux so far.
14:46gfxstrand[d]: HdkR: Why? So you can plug a Radeon into it?
14:48gfxstrand[d]: 😝
14:50HdkR: gfxstrand[d]: Of course
14:51HdkR: NVIDIA has yet to Grace me, so I need an alternative. And Thor is taking forever
14:53gfxstrand[d]: Yeah, the Thor boards are going to be interesting as well. They're claiming massive speedups over Orin but that may just be for AI stuff.
14:53Jasper[m]: HdkR: https://youtu.be/3GDr-eyhsSM (loud)
14:53HdkR: At the very least Thor won't have bugged atomics like Orin
14:54HdkR: But I still tend to use the Orin because the alternatives also have compromises
14:54gfxstrand[d]: What'd they screw up with atomics?
14:54gfxstrand[d]: That seems very not good
14:56HdkR: Cortex-A78AE that NVIDIA ships in Orin has an errata (1951502) where all atomic acquires without release (including LRCPC) has incorrect behaviour. So they patch the instructions to do a dmb+acquire
14:58HdkR: https://www.youtube.com/watch?v=q6-ZwiOb_B0 Last ten seconds of that game, running at around 15FPS because one core is hammering atomic loads. Testing on Cortex-A78C, the same game was vsync locked to 60 with that thread only consuming ~40% CPU time
14:59HdkR: Hardware errata so can't really work around it ¯\_(ツ)_/¯
15:13gfxstrand[d]: Ugh...
15:13gfxstrand[d]: That's pretty terrible
15:17HdkR: You can see why I'm trying to get off the platform :D
15:57gfxstrand[d]: Yeah
18:58phomes_[d]: I finally managed to narrow down this game rendering issue I am debugging. I might need a hint of what to do to fix it though
18:58phomes_[d]: The issue is in a shader that does this:
18:58phomes_[d]: ```float mask = texture(map0Tex, input.UV0.xy).w;
18:58phomes_[d]: float delta = 0.75 * fwidth(mask);
18:58phomes_[d]: output.w *= smoothstep(params.x - delta, params.y + delta, mask);```
18:59phomes_[d]: It works most of the time but when we are in an area where the pixel and the surrounding pixels all have .w = 0 then the output of smoothstep is random value
18:59phomes_[d]: However, if I modify the shader to not multiply by 0.75 then we always get the correct output
19:11airlied[d]: gfxstrand[d]: yes still proper GSP not Tegra at all
19:12airlied[d]: Though the fact they are producing those seems to imply they have excess GH dies 🙂
19:15gfxstrand[d]: No, DIGITS is GB
19:16gfxstrand[d]: They might be cutting GHs in half and strapping on a Blackwell for all I know but it's not Hopper.
19:31gfxstrand[d]: phomes_[d]: That's wonky... Could be derivatives going wrong but that seems unlikely. Could be smoothstep being weirdly broken.
20:00mhenning[d]: Does anyone have games that fail with a generic
20:00mhenning[d]: nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:64 type:13 scope:1 part:233
20:00mhenning[d]: nouveau 0000:01:00.0: fifo:c00000:0008:0040:[hw_tests::test_[14761]] errored - disabling channel
20:00mhenning[d]: in dmesg?
20:00mhenning[d]: I'm curious if https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32952 fixes anything in particular
20:01mhenning[d]: Looking at the spreadsheet, Fallout 76 is a possible candidate
20:02tiredchiku[d]: I might, will check tomorrow
21:57mohamexiety[d]: gfxstrand[d]: Nah the CPU is different as well
21:57mohamexiety[d]: It’s X925 + A725. Completely new SOC made with collaboration with MediaTek
21:58mohamexiety[d]: I don’t know why they call the CPU side Grace tbh but I guess that’s their name for their CPUs now or something
21:59mohamexiety[d]: The GPU side is interesting from our pov too since apparently it’s closer to DC Blackwell rather than GeForce one? But details are vague so it’s hard to say
22:03phomes_[d]: mhenning[d]: I have this one in-game on Path of exile:
22:03phomes_[d]: ´´´nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:120 type:31 scope:1 part:233
22:03phomes_[d]: nouveau 0000:01:00.0: fifo:c00000:000f:0078:[PathOfExileStea[161985]] errored - disabling channel´´´
22:03phomes_[d]: But it is still there with your branch
22:06gfxstrand[d]: mohamexiety[d]: They I'm not sure about. The device is intended to mimic the datacenter version but that doesn't mean they're the same. From a memory POV they should be. It's all shared between CPU and GPU. But whether the Blackwell chip is the same or not I don't know.
22:07gfxstrand[d]: mhenning[d]: Good catch!
22:11HdkR: Well UMA, but the GPU doesn't have any dedicated HBM, but still connected over nvlink-c2c. It's quite a different SoC, shouldn't really be compared to the silly 72 core server Grace :D
22:15gfxstrand[d]: phomes_[d]: Looking more, smoothstep is undefined if delta=0. We could probably degrade to step but IDK that we do.
22:16gfxstrand[d]: HdkR: Yeah, mimic from a software PoV. They don't care about mimicking hardware because who would be crazy enough to program the hardware directly?
22:16phomes_[d]: gfxstrand[d]: but fwidth should return 0 in both cases right? Only when we multiply 0 by 0.75 is when we get the problem
22:17HdkR: Yea, as long as you don't need to care about locality then it just becomes easier on the Mediatek SoC
22:17HdkR: Just slower because smaller GPU and less CPU cores
22:17gfxstrand[d]: I'm guessing the multiply sanitizes something or gets fused with something and messes with the precision.
22:34gfxstrand[d]: phomes_[d]: You'd have to look at the final NIR and NAK with and without the 0.75 to see what's getting optimized differently. It could be that we have an invalid transform somewhere and it could be that without the 0.75 we just get lucky
22:35phomes_[d]: I will do that. Thank you for the help
23:40_lyude[d]: btw airlied[d] regarding the vblank issue you mentioned the other day - after talking with skeggsb a bit and trying without modesetting enabled I'm a bit less convinced it's actually from itnerrupt races