IRC Logs of #nouveau on irc.freenode.net for 2025-10-16

02:14 gfxstrand[d]: !33959 has been assigned to marge. :transtada128x128:
03:18 gfxstrand[d]: And... Merged.
03:26 chikuwad[d]: wahoo
04:05 mangodev[d]: gfxstrand[d]: [yes, kind of](https://gitlab.freedesktop.org/mesa/mesa/-/issues/13459)
04:05 mangodev[d]: though it's only listed as the blur, not just general flickering
04:06 mangodev[d]: the problem has slowly been diminishing over time, i think it's a combination of multiple issues from many sources
04:13 mangodev[d]: rebuilt, still happens
04:13 mangodev[d]: when i have the spare time, i may try building the servo webrender test suite to see what's going wrong
04:54 mhenning[d]: mangodev[d]: I already ran the webrender test suite and wasn't able to reproduce it
04:54 mhenning[d]: as is listed on the bug
04:55 mhenning[d]: mangodev[d]: also, if the bug is wrong please go ahead and correct it
05:15 gfxstrand[d]: I wasn't able to reproduce on Zink+ANV but I didn't get around to trying with NVK.
09:01 snowycoder[d]: What does the SASS `sgxt` instruction do?
09:01 snowycoder[d]: Oh, nvm, there's a foldable impl
10:00 karolherbst[d]: signextend 🙂
10:02 snowycoder[d]: And for unsigned (`.u32`), it clears the N upper bits (right?)
10:14 karolherbst[d]: I think so
10:14 karolherbst[d]: file
10:14 karolherbst[d]: ops wrong window
10:27 mangodev[d]: gfxstrand[d]: i am curious on one thing pertaining to nvk
10:27 mangodev[d]: what are the notable differences betweeen turing 16xx and 20xx series cards (other than the lack of RT and Tensor cores and power differences)? afaik, both have GSP, and both have other newer hardware capabilities like variable-rate shading (though i *think* not ReBAR?)
10:27 mangodev[d]: are there any other differences or quirks between the two that affect the drivers and their capabilities?
10:35 mohamexiety[d]: Just RT and tensor really
10:35 mohamexiety[d]: None of the Turing GPUs got rebar officially
10:35 chikuwad[d]: mangodev[d]: entirety of turing generation doesn't have ReBAR
10:35 chikuwad[d]: at least on the official driver
10:36 chikuwad[d]: I don't know if that can be worked around in nouveau though
10:36 chikuwad[d]: I know it's possible by uefi firmware patching shenanigans, but
10:36 chikuwad[d]: .-.
10:36 mangodev[d]: chikuwad[d]: wait *what*
10:36 mangodev[d]: i thought ReBAR is a hardware capability
10:37 chikuwad[d]: it's a PCIe spec, yeah
10:38 mangodev[d]: ah
10:38 mangodev[d]: so the cards support it, they just… never bothered implementing it?
10:38 chikuwad[d]: vbios toggle most likely
10:38 mangodev[d]: 🫠
10:39 chikuwad[d]: also it's been a feature since
10:39 chikuwad[d]: PCIe 2 extended config space
10:39 chikuwad[d]: and only became part of the base pcie spec in pcie 4
10:39 mangodev[d]: even still
10:39 chikuwad[d]: but technically it's been possible for _ages_
10:40 mangodev[d]: i think my card has pcie 5?
10:40 mangodev[d]: so there doesn't seem to be much stopping it
10:40 chikuwad[d]: yeh
10:40 chikuwad[d]: except the vbios toggle :D
10:41 chikuwad[d]: https://github.com/xCuri0/ReBarUEFI
10:42 chikuwad[d]: https://github.com/terminatorul/NvStrapsReBar
10:42 mangodev[d]: cursed
10:42 chikuwad[d]: you _can_ patch it into your machine firmware, I'd done it a while ago on my laptop (that I no longer have) too
10:42 chikuwad[d]: but you stand the very real risk of doing it wrong and bricking it
10:43 mangodev[d]: i for some reason always thought i was missing out on rebar by not being on 20 series
10:43 chikuwad[d]: nope, official rebar support is ampere and newer only
10:43 chikuwad[d]: so the only thing you're missing is the RT hardware and the "tensor cores"
10:44 chikuwad[d]: 1600 and 2000 are identical otherwise
10:44 mangodev[d]: kinda feel lucky how i'm situated
10:45 mangodev[d]: aren't some earlier 16xx cards pascal or some weird abomination like that?
10:45 mohamexiety[d]: No
10:45 chikuwad[d]: nope, all 16xx are turing
10:45 mangodev[d]: at least there's that
10:45 chikuwad[d]: they do have volta's encoder engine though iirc, but that's for all turing cards I believe
10:46 x512[m]: Why UEFI should care about ReBAR? OS can reallocate PCI BAR ranges if needed?
10:46 chikuwad[d]: yeah but the firmware has to expose support first
10:47 mangodev[d]: was nvidia maybe just not confident with the feature until ampere?
10:47 chikuwad[d]: who knows
10:48 chikuwad[d]: not me for sure :3
10:48 mangodev[d]: i am very excited for the rapid changes in nvk lately
10:49 mangodev[d]: i noticed mesa bumped version in git recently?
10:49 mangodev[d]: 25 feels like yesterday, feels weird being on 26
10:50 chikuwad[d]: yeah 25.3 branchpoint happened a few hours ago
10:51 mangodev[d]: vibes-based versioning
10:51 mangodev[d]: 25.3 > 26.0
10:52 chikuwad[d]: there is a system
10:52 mangodev[d]: i think the changes yesterday have improved my system responsiveness though, which is good
10:53 mangodev[d]: chikuwad[d]: i'd think all of the new drivers are part of the decision? honeykrisp, ethos, some others i'm definitely forgetting
10:53 chikuwad[d]: no I meant there is a defined versioning system
10:54 chikuwad[d]: 25.3 is gonna be the last major version for 2025
10:54 mangodev[d]: ahhhh i see
10:54 chikuwad[d]: mesa does quarterly releases
10:54 mangodev[d]: so the versioning can really only go up to .3?
10:54 chikuwad[d]: in XX.Y.Z, XX is the year, Y is the quarter, Z is the minor patch
10:54 chikuwad[d]: yeah
10:55 chikuwad[d]: the quarters are zero indexed
10:55 chikuwad[d]: :3
10:55 chikuwad[d]: so mesa 26.0 will happen in Q1 2026
10:56 mangodev[d]: makes sense
10:56 mangodev[d]: now i seriously don't know how i didn't make that correlation earlier
11:21 mohamexiety[d]: kinda annoying tbh cuz the branch point was 31st before and now some of the bigger perf things wont make it till 2026
11:21 mohamexiety[d]: oh well
11:29 gfxstrand[d]: We switched to quarterly several years ago. We used to try and do feature-based releases but there are just too many drivers with independent feature sets that it became impossible to decide what was or wasn't a major release.
12:24 karolherbst[d]: mohamexiety[d]: well maintainers can still move them to stable if they thing it's worth it
12:25 karolherbst[d]: just maybe not 2 weeks before release 🙃
12:25 karolherbst[d]: but also kinda depends
12:25 karolherbst[d]: if it's a big feature then yeah.. no
13:39 snowycoder[d]: Anybody worked on fragment shader interlock?
13:41 snowycoder[d]: The produced shaders are on one half, slightly cursed async code using tickets and spin loops, on the other half extremely cursed handling of helper invocations.
13:42 marysaka[d]: I did poke at it some months ago a bit
13:43 marysaka[d]: the part I still needed to look at was how that memory that is used is allocated, it seems to be per SM (so need TPC/GPC info to allocate)
13:43 marysaka[d]: and if that block of memory is init with anything significant
13:43 marysaka[d]: but yeah it's very cursed, they also set one other thing on the command stream for ordering
13:44 snowycoder[d]: I think that's needed to generate the special ticket register?
13:44 marysaka[d]: it's another bit in the SPH actually
13:45 marysaka[d]: can't remember precisely but I saw only one bit change when it was reading ticket reg
13:45 marysaka[d]: I could probably grab my branch and push it later it was very much not done ect
13:47 snowycoder[d]: Dw, I was only trying an open issue, didn't think that anybody worked on it already
13:47 marysaka[d]: there is an issue I think!
13:47 marysaka[d]: didn't assign myself because had no time to poke more at it
13:48 snowycoder[d]: You have a way deeper understanding than what I do about it though
13:48 marysaka[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9634
13:48 marysaka[d]: snowycoder[d]: thing is that I need to find my note again because it was messy 😄
13:50 marysaka[d]: For my branch that is just mostly junk in term of NIR lowering (was mostly playing around the orderign ticket a bit) https://gitlab.freedesktop.org/marysaka/mesa/-/commits/nvk-fsi
13:51 marysaka[d]: If someone do touch FSI other than me, please try to write unit tests to understand this mess more 😄
13:56 esdrastarsis[d]: marysaka[d]: triang3l[d] 👀
14:00 snowycoder[d]: marysaka[d]: I know that nv-prop does that, but why do we need a spin-lock at the end of the critical section? 0_o
14:01 marysaka[d]: snowycoder[d]: can't remember the shaders precisely but might be to delay the write?
14:27 snowycoder[d]: Do we have unit tests that check hardware behaviour? I can only see `mme/tests` that checks our simulator of mme against the real thing.
14:29 snowycoder[d]: I don't think we want to simulate a draw call to unit test interlocks
14:30 mohamexiety[d]: gfxstrand[d]: and I use crucible for this. writing and running random tests for specific HW behavior
14:30 mohamexiety[d]: nothing merged though since it's all usually fairly specific and targetted, all in downstream repos
14:31 mohamexiety[d]: crucible: https://gitlab.freedesktop.org/mesa/crucible
14:33 mohamexiety[d]: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/155 e.g. what we used to RE the internal tiling layout
14:34 mohamexiety[d]: you can also write your own (e.g. mel did this for the sync behavior) but I personally prefer crucible since it simplifies a lot of the boilerplate and such
14:39 triang3l[d]: marysaka[d]: Just ignore the Piglit test for sample interlock at least, it seems to be doing something that the API doesn't guarantee https://gitlab.freedesktop.org/mesa/piglit/-/issues/93
14:39 triang3l[d]: though it will probably pass on Nvidia anyway
14:41 marysaka[d]: triang3l[d]: I was testing with VKCTS
14:42 marysaka[d]: but one idea I had was to actually have some integration test on nvk/nak to test the ordering ticket behavior and ensure it is consistent between generations ect
14:43 mohamexiety[d]: oh yeah we also do have hwtests in nak
14:43 triang3l[d]: marysaka[d]: By the way, do you safely handle early `discard` from within the critical section?
14:44 marysaka[d]: NV blobs seems to handle that fine yeah
14:44 marysaka[d]: I didn't really finish my mess, was mostly trying to form something based on the thing I saw but it was mostly incomplete
14:47 snowycoder[d]: mohamexiety[d]: those only run with compute shaders, from what I know you can't customize the QMD per-test (e.g. setting up the interlock hw thingy)
14:47 mohamexiety[d]: ah
16:07 gfxstrand[d]: steel01[d]: Thanks! I've got an ADB shell now.
16:08 gfxstrand[d]: Not sure what I'm going to do with it but I've got one. 😂
16:30 steel01[d]: gfxstrand[d]: Easier access to logs, if nothing else. logcat gives different stuff compared to dmesg/console.
16:32 steel01[d]: I grabbed your commit yesterday and tossed it at t186. It's not logging anything nouveau specific for me, though. Just the generic 'gbm allocator starting' stuff. 0o Trying a couple different things now to see if I can verify that it's going into the right backend.
16:36 steel01[d]: Oh. If I use the hidl backend, it looks like it tries, but the kernel driver doesn't like what it gets.
16:36 steel01[d]: 01-01 00:00:50.655 0 0 E : [ C0] nouveau 17000000.gpu: gr: TRAP ch 2 [0082e73000 surfaceflinger[576]]
16:36 steel01[d]: 01-01 00:00:50.663 0 0 E : [ C0] nouveau 17000000.gpu: gr: GPC0/PROP trap: 00000400 [RT_LINEAR_MISMATCH] x = 32, y = 0, format = 18, storage type = 0
16:38 steel01[d]: To replicate this, set `TARGET_MINIGBM_HAL_INTERFACE ?= hidl` with the other new flags in tegra-common.
18:23 steel01[d]: 01-01 00:00:26.626 0 0 E : [ C0] nouveau 57000000.gpu: fifo: fault 01 [WRITE] at 000000000276c000 engine 00 [gr] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 3 [0400302000 BootAnimation[645]]
18:23 steel01[d]: 01-01 00:00:26.643 0 0 E : [ C0] nouveau 57000000.gpu: fifo:000000:0003:[BootAnimation[645]] rc scheduled
18:23 steel01[d]: 01-01 00:00:26.643 0 0 E : [ C0] nouveau 57000000.gpu: fifo:000000: rc scheduled
18:23 steel01[d]: 01-01 00:00:26.643 81 81 E nouveau 57000000.gpu: fifo:000000:0003:0003:[BootAnimation[645]] errored - disabling channel
18:23 steel01[d]: 01-01 00:00:26.039 500 500 I PowerHalLoader: Successfully connected to Power HAL AIDL service.
18:23 steel01[d]: 01-01 00:00:26.681 81 81 W nouveau 57000000.gpu: BootAnimation[629]: channel 3 killed!
18:23 steel01[d]: Mmm, t210 is unhappy in a different manner.
18:38 cubanismo[d]: mangodev[d]: I don't think we'd had time to validate it until after Turing launched. We don't generally update the VBIOS of shipping products unless it's something catastrophic.
18:39 chikuwad[d]: pretty sure there was a vbios update tool for ampere back then that enabled rebar
18:39 chikuwad[d]: yeah
18:39 chikuwad[d]: https://nvidia.custhelp.com/app/answers/detail/a_id/5165/~/nvidia-resizable-bar-firmware-update-tool
18:39 chikuwad[d]: unless it updated something other than the vbios
18:40 cubanismo[d]: I guess we considered it catastrphic at that point then
18:41 mohamexiety[d]: ampere was when rebar started getting in the mainstream marketing
18:41 mohamexiety[d]: so it makes sense
18:42 mohamexiety[d]: the rollout has been kinda weird on windows tbh; apparently d3d wasnt really made with it in mind so some games have had regressions and such so the driver ends up selectively enabling/disabling it
18:42 cubanismo[d]: IIRC, we didn't have driver code in place until around that time to take advantage anyway, but timelines are always fuzzy for me. Usually by the time something's public I've already forgotten about it.
18:43 cubanismo[d]: I personally didn't now ReBar was a thing until the marketing started.
18:44 mohamexiety[d]: I was kind of surprised it came this late because it feels kind of intuitive that this would be better and much easier than having to fight over that 256MB region
18:44 mohamexiety[d]: but I lack _a lot_ of historical context
18:46 cubanismo[d]: Well it's probably like a lot of things. At any give time, there are thousands of things us engineers know could easily be better. However, let's say there are hundreds of us, not thousands, so there's an order of magnitude mismatch between what we can do at any given time and what we would ideally do. Then you take into account how much testing and QA time you need to validate it, and management
18:46 cubanismo[d]: starts asking you smart questions like "How much faster will it make 3DMark? None? Back of the line."
18:47 mohamexiety[d]: oh yeah fair ofc. I was moreso talking about it as an industry thing in general. like it wasn't just NV, it simply wasn't much of a thing pre-2020
18:47 cubanismo[d]: But then AMD blogs that ReBar makes these 3 top games faster because they chose that as their marquee feature, put in the time to work with the game devs to take advantage of it and/or saw that they could more easily port Xbox games to PC with it, etc. Then you can go back to those same managers and say "10%" and it gets done right quick.
18:48 mohamexiety[d]: yeah
18:48 HdkR: It also helped that NVIDIA did /really/ well even without rebar. Which was kind of amazing by itself :D
18:48 cubanismo[d]: Yeah, I was going to say, for reasons it matters more to AMD than NV.
18:49 cubanismo[d]: It still frustrates me how much the press jumped on AMD's "AMD is 10% faster with Vulkan, NV is only 2% faster! AMD FTW!" marketing, when in the actual numbers, our OpenGL was still faster than both their Vulkan and OpenGL. It was good marketing.
18:50 mohamexiety[d]: and then Intel just lives and dies on rebar. no rebar? welp, there goes like most of your perf. (might have been fixed recently but this was a huge pain point for their dGPU effort. budget GPUs that can't actually be used on old systems)
18:50 HdkR: They did well with that marketing.
18:50 mohamexiety[d]: ye
18:51 cubanismo[d]: Yeah, someone at AMD should have gotten an award for that. Turning bad OpenGL/D3D11 perf into a marketing win is respectable.
18:52 cubanismo[d]: Most BIOSes suck at allocating BAR space. If it's always big, you just fail to boot on those systems.
18:52 cubanismo[d]: So you have to wait until the OS can go check that everything is sane, then biggen it.
18:52 sonicadvance1[d]: misyltoad[d]: Some PCIe devices genuinely explode if given a large BAR size. For...some reason.
18:52 mohamexiety[d]: oh damn
18:52 cubanismo[d]: Yeah, I can see it being a problem in both directions.
18:52 sonicadvance1[d]: Well, not actual sparks. Just either halt the system or stop working I guess 😛
18:53 cubanismo[d]: Also, maybe you don't want the biggest BAR possible when you have 8 GPUs in the system, just when you have 2 or so.
18:53 cubanismo[d]: Only the OS can resolve stuff like that.
18:53 sonicadvance1[d]: Ideally you size your IOMMU aperture limits for ungodly amounts of VA and not care, but ARM devices especially don't want to do that.
18:54 mohamexiety[d]: ~~it's always those ARM devices~~
18:54 mohamexiety[d]: why though?
18:56 sonicadvance1[d]: I mean, IOMMU has overhead since it has to do TLB lookups and stuff. I assume if you limit the maximum VA it can handle that you can cut the size of those, making it more efficient, smaller die area, etc etc
18:57 sonicadvance1[d]: No one likes a device where enabling IOMMU on the GPU causes a 5% perf hit.
18:57 mohamexiety[d]: ah yeah, fair enough
18:57 mohamexiety[d]: thanks!
18:57 x512[m]: Is ReBAR really important? Is it much better compared to staging GTT buffers?
18:59 cubanismo[d]: It can avoid a copy, yeah.
18:59 cubanismo[d]: Why stage things into a sysmem buffer to blit them to vidmem if you can just... write them straight to vidmem.
18:59 cubanismo[d]: All assuming its worth writing them to vidmem, but often it is.
19:00 sonicadvance1[d]: Game dev paradigms are also shifting that they want coherent memory visibility just like the consoles offer.
19:01 cubanismo[d]: Yeah, traditionally the pipelining you get with the staging buffer is also nice. E.g., it'd all just happen in OpenGL context order anyway.
19:01 cubanismo[d]: With Vulkan & friends, you already have to manage the complexity of ordering things, so you might as well just manage the complexity of not using a pipelined operation too
19:01 gfxstrand[d]: steel01[d]: What is hidl?
19:01 x512[m]: What if kernel will implement transparent mapping of VRAM buffers with paging to CPU memory if needed?
19:02 chikuwad[d]: gfxstrand[d]: Hardware Interface Definition Language
19:02 chikuwad[d]: https://source.android.com/docs/core/architecture/hidl
19:02 cubanismo[d]: That's not as clear of a win as you'd think.
19:02 chikuwad[d]: though it was replaced with AIDL in android 10
19:02 chikuwad[d]: (A for Android)
19:03 cubanismo[d]: E.g., see WDDMv1. It wasn't great.
19:04 x512[m]: If implement paging to CPU memory, any VRAM can be mmap'ed no matter of BAR size.
19:05 cubanismo[d]: Sure, but will it actually be VRAM when you want it to be?
19:05 cubanismo[d]: Or will it be busy paging?
19:05 steel01[d]: gfxstrand[d]: One of the interfaces between the hal and the android frameworks. The old style gets called 'libhardware'. Then there's hidl, which was used for several versions. Now that's replaced by aidl. Devices targeting the latest version of android need to use aidl. But seems like that's not working atm, so we can use hidl for initial verification.
19:06 steel01[d]: Oh, but meh. Hang on, you need a couple extra changes for that to build.
19:06 x512[m]: cubanismo[d]: I mean VRAM->CPU RAM paging is only used for CPU access, VRAM is always allocated.
19:06 cubanismo[d]: Yeah, but if you want to write to it, it has to get paged to VRAM at some point.
19:06 x512[m]: Yes, some synchronization call.
19:07 cubanismo[d]: And then perfwise, you're better off with the staging buffer.
19:07 chikuwad[d]: rebar makes it such that the cpu can just access all the vram from the get go
19:08 sonicadvance1[d]: Let's just accept rebar and not have to deal with the nightmare of dynamic paging 😛
19:08 steel01[d]: steel01[d]: https://gitlab.incom.co/CM-Shield/android_external_minigbm/-/commit/e4d8bb4f52a236e62d4c00b47785adb90a6c57a1
19:08 steel01[d]: gfxstrand[d] This should just get squashed to your commit. Just a little more build system plumbing.
19:10 cubanismo[d]: Yeah, besides the perf, having large BARs just makes lots of things easier. It's not a bad thing.
19:19 gfxstrand[d]: steel01[d]: Re-building with that now.
19:20 gfxstrand[d]: Should be boot-looping with it in just a few minutes. 😂
19:29 gfxstrand[d]: That didn't seem to enable anything
19:31 gfxstrand[d]: Missing the android.mk bit
19:31 gfxstrand[d]: Trying again
19:37 gfxstrand[d]: yeah
19:37 gfxstrand[d]: But I've got it now. I'm just waiting for it to finish building so I can re-flash
19:49 gfxstrand[d]: `servicemanager: Caller(pid=771,uid=1000,sid=u:r:hal_graphics_composer_default:s0) Could not find android.hardware.graphics.allocator.IAllocator/default in the VINTF manifest. No alternative instances declared in VINTF.`
19:58 steel01[d]: gfxstrand[d]: Mmm. Did you do a installclean before you rebuilt after the hidl change?
19:59 gfxstrand[d]: I did
19:59 steel01[d]: 01-01 00:00:26.472 229 229 I servicemanager: Caller(pid=629,uid=1003,sid=u:r:bootanim:s0) Could not find android.hardware.graphics.allocator.IAllocator/default in the VINTF manifest. No alternative instances declared in VINTF.
19:59 steel01[d]: 01-01 00:00:26.499 229 229 I servicemanager: Caller(pid=629,uid=1003,sid=u:r:bootanim:s0) Could not find android.hardware.graphics.allocator.IAllocator/default in the VINTF manifest. No alternative instances declared in VINTF.
19:59 steel01[d]: 10-16 17:20:15.949 229 229 I servicemanager: Caller(pid=1044,uid=1000,sid=u:r:surfaceflinger:s0) Could not find android.hardware.graphics.allocator.IAllocator/default in the VINTF manifest. No alternative instances declared in VINTF.
19:59 steel01[d]: 10-16 17:20:15.982 229 229 I servicemanager: Caller(pid=1044,uid=1000,sid=u:r:surfaceflinger:s0) Could not find android.hardware.graphics.allocator.IAllocator/default in the VINTF manifest. No alternative instances declared in VINTF.
19:59 steel01[d]: I get a few of those too, but it's still trying. 0o Not from hwc, though.
20:00 gfxstrand[d]: I'm also seeing hwcomposer segfault
20:00 steel01[d]: If you do a `ps -ef |grep alloc`, is it running?
20:01 steel01[d]: jetson:/ # ps -ef |grep alloc
20:01 steel01[d]: system 461 1 0 00:00:11 ? 00:00:00 android.hidl.allocator@1.0-service
20:01 steel01[d]: system 467 1 0 00:00:11 ? 00:00:00 android.hardware.graphics.allocator@4.0-service.minigbm_nouveau
20:01 gfxstrand[d]: it is
20:01 gfxstrand[d]: system 510 1 0 00:00:22 ? 00:00:00 android.hidl.allocator@1.0-service
20:01 gfxstrand[d]: system 513 1 0 00:00:22 ? 00:00:00 android.hardware.graphics.allocator@4.0-service.minigbm_nouveau
20:01 steel01[d]: Okay, so it's probably in a similar state to mine. And yes, hwc does implode for me.
20:01 gfxstrand[d]: And with a low enough PID that it's not restarting constantly
20:02 steel01[d]: Hmm, after a reboot, I'm not getting the nouveau kernel errors I saw earlier. Uhhhh...
20:03 gfxstrand[d]: I think I need to add some super verbose logging
20:03 steel01[d]: Hwc is throwing a npe for me. Errors trying to get the display info.
20:04 gfxstrand[d]: Okay, so we're seeing the same thing
20:04 steel01[d]: I wonder if that's because you're telling it to use nouveau, which doesn't have display. Scanout or whatever the internal term for that is.
20:04 gfxstrand[d]: That's quite possible
20:05 gfxstrand[d]: The backtrace is at
20:05 gfxstrand[d]: #00 pc 00000000000261a0 /vendor/bin/hw/android.hardware.composer.hwc3-service.drm (android::HwcDisplay::GetDisplayBoundsMm()+20)
20:05 steel01[d]: Afaik, other arm display ips also do the same. Like adreno and all. It shouldn't be a unique issue.
20:05 steel01[d]: gfxstrand[d]: Yep.
20:06 gfxstrand[d]: jetson:/ # ls /dev/dri
20:06 gfxstrand[d]: card0 card1 renderD128 renderD129
20:06 gfxstrand[d]: Presumably minigbm is sitting on the right once since it started okay
20:06 steel01[d]: In this case card0 is tegra-drm and card1 is nouveau. Could probably change based on probe order, but in this case, tegra-drm is probed long before nouveau.
20:07 steel01[d]: It looks like it. I added some logging to see what was being checked, but not enough to show what got picked. ><
20:08 steel01[d]: 01-01 00:00:30.938 757 771 E [minigbm:drv_get_backend(106)]: Looking for driver to match tegra
20:08 steel01[d]: 01-01 00:00:30.938 757 771 E [minigbm:drv_get_backend(106)]: Looking for driver to match nouveau
20:08 steel01[d]: So all that says is that it looked for tegra, didn't find it, then looked for nouveau. Presumably it then matched, but I didn't add logging for that.
20:12 gfxstrand[d]: Trying to get minigbm to actually log for me
20:13 steel01[d]: I had issues on the aidl default variant, couldn't get anything to log at all. Which is why I switched to hidl and my logs suddenly showed up.
20:13 gfxstrand[d]: ugh
20:14 gfxstrand[d]: I should be on hidl
20:14 steel01[d]: You are, per the ps output.
20:14 gfxstrand[d]: What is the log tag?
20:14 gfxstrand[d]: I thought it was minigbm but maybe that's wrong?
20:14 steel01[d]: steel01[d]: `minigbm:$functionname`, apparently.
20:15 steel01[d]: Maybe it's not getting to where you added log lines?
20:15 gfxstrand[d]: Wait, the log tag has the function name in it?
20:16 steel01[d]: drv_log does magic, I'm guessing.
20:16 steel01[d]: Oh, what are you using to log?
20:16 gfxstrand[d]: `adb logcat`
20:16 steel01[d]: In the code, I mean.
20:16 gfxstrand[d]: `drv_logd()`
20:17 steel01[d]: Ah, bump that to e. d is probably not enabled by default.
20:20 steel01[d]: There's frameworks handling for log tag log levels. On tegra, I default the global to 'I', cause D and V outright slow down boot and general operation. It's insane levels of spam. And for some reason, google thinks it's fine to ship with debug logging enabled. ><
20:22 gfxstrand[d]: When I do `m gralloc.minigbm_nouveau`, where does it put the compiled executable?
20:22 steel01[d]: out/target/project/foster/vendor/bin/hw
20:23 gfxstrand[d]: That says it hasn't been touched in an hour
20:23 steel01[d]: Oh hang on, that's the wrong build target. That's the old libhardware version that's not in use.
20:24 steel01[d]: For hidl, you want `android.hardware.graphics.allocator@4.0-service.minigbm_nouveau`.
20:25 steel01[d]: You'll need to push that and the dependency. Getting that path.
20:27 steel01[d]: vendor/bin/hw/android.hardware.graphics.allocator@4.0-service.minigbm_nouveau
20:27 steel01[d]: vendor/lib64/libminigbm_gralloc_nouveau.so
20:27 steel01[d]: vendor/lib64/hw/android.hardware.graphics.mapper@4.0-impl.minigbm_nouveau.so
20:27 steel01[d]: To be safe, you probably want to build `android.hardware.graphics.allocator@4.0-service.minigbm_nouveau` and `android.hardware.graphics.mapper@4.0-impl.minigbm_nouveau`, which yes you can do in one build command. Then push all three. `libminigbm_gralloc_nouveau.so` is where the logic you're touch is, though. It's 'probably' okay to only build and push that, but deps can do weird things sometimes.
20:34 gfxstrand[d]: Ah. It only re-built `vendor/lib64/libminigbm_gralloc_nouveau.so`
20:34 gfxstrand[d]: That's why mtimes hadn't changed for the others
20:35 steel01[d]: Ah. ninja might actually be being smart about that. Sometimes that works well. Other times it doesn't know all the source inputs, so... it's not so smart.
20:54 gfxstrand[d]: Okay, I've confirmed I can adb push stuff and I've got logging working.
20:54 steel01[d]: That simplifies the loop.
21:00 gfxstrand[d]: Yeah but now I have no idea how to tell HWComposer to look at the other device. 😭
21:02 steel01[d]: There's a property in /vendor/build.prop. vendor.drm something. You can try changing that from nouveau to tegra and see if if makes a difference.
21:02 steel01[d]: I'm guessing it won't. Or will break stuff worse.
21:04 gfxstrand[d]: Does Qualcomm have this problem?
21:06 gfxstrand[d]: Ooh
21:07 gfxstrand[d]: So this is fun...
21:07 gfxstrand[d]: There's an old tegra back-end that got deleted
21:08 steel01[d]: Yes. I tried to rez that and failed utterly. Broken stride causing stairsteps. Nothing looked off to me, though.
21:08 gfxstrand[d]: Oh, well I can fix that
21:10 gfxstrand[d]: Anyway, I think the answer is to allocate from the tegra node
21:10 airlied[d]: android, ecosystem of choice
21:10 steel01[d]: I've got the patch somewhere still if you want to look at what I had. Will be a few before I can go hunt, though.
21:10 gfxstrand[d]: I need to decide if my patch is better or not
21:11 gfxstrand[d]: I'm fairly sure my code is better. :frog_upside_down:
21:12 steel01[d]: Almost certainly. :p Though, I'd still like something I can use on t194 without nouveau. That doesn't need to limit the nouveau enabled path, though.
21:14 steel01[d]: But maybe just changing the name to tegra would be enough. I *think* that can pass through nouveau commands. But maybe that was the inverse.
21:14 gfxstrand[d]: I can make it use the tegra ioctl
21:14 gfxstrand[d]: It's easy enough, I think.
21:19 gfxstrand[d]: ugh...
21:19 gfxstrand[d]: I'm not sure if nouveau GL will be okay with a buffer from Tegra
21:21 steel01[d]: I mean... that's what I've been doing, afaik. 😛
21:22 steel01[d]: Using gbm_gralloc which uses something inside mesa to do it's thing. Pretty sure that was using the tegra path for alloc.
21:22 steel01[d]: And before that, I was using minigbm dumb buffers via the tegra path.
21:24 gfxstrand[d]: Oh, there's a set_tiling ioctl
21:24 gfxstrand[d]: That's silly but okay
21:25 steel01[d]: https://gitlab.incom.co/CM-Shield/android_external_minigbm/-/commit/79dce816510669878792927a13248bc2d11f182e
21:25 steel01[d]: This was my attempt.
21:26 steel01[d]: Basically just a revert of what was dropped, a few build fixes, and a couple blind attempts to convert from the downstream tiling to upstream. Like... very blind.
21:34 gfxstrand[d]: Okay, I've pulled that on top of my nouveau patch, added the gralloc4 service stuff, switched to tegra, and I'm attempting a build
21:35 steel01[d]: 🔥 Flames. Probably the result of that. 😛
21:41 gfxstrand[d]: So... I suspect what we want at the end of the day is two back-ends and a shared tiling file that does tiling calculations for both
21:43 steel01[d]: And tegra works on the embedded/mobile devices and nouveau will be used on desktop/laptop?
21:44 gfxstrand[d]: Yup
21:44 gfxstrand[d]: They're different ioctls
21:44 gfxstrand[d]: But fundamentally the same tiling format
21:48 gfxstrand[d]: I've got tegra but hwcomposer is still crashing
21:49 steel01[d]: Oh? Uhhh. It worked on android 15. If you push everything to your repo again, I'll take a look in a bit. Might need to shuffle some props again.
21:50 gfxstrand[d]: Pushed
21:50 gfxstrand[d]: I've got to head home
21:51 steel01[d]: Sure. Gives me time to tinker with it overnight before you're back with hands on the unit.