11:55hanetzer: ello folks. researching mmiotrace a bit, and ya'll seem to be the primary (only?) consumer at the moment. end goal is mmiotrace for !x86. the page https://nouveau.freedesktop.org/MmioTraceDeveloper.html suggests using emulation instead of page faulting, anyone have an overview of how that would work? I can't see emulating all load/store opcodes as being particularly performant
12:09karolherbst: hanetzer: yeah.. it's not, but it would help with a few issues
12:09karolherbst: atm repeat instructions aren't supported
12:09karolherbst: so if you got an x86 rep mov...
12:10hanetzer: karolherbst: as in, the current state of mmiotrace can't deal with it?
12:10karolherbst: emulation would have the benefit of being perfect in tracking what happens, but as you said, also quite slow
12:10karolherbst: dunno.. maybe we can deal with reps in another way
12:10hanetzer: hrm. well, that said, using the kprobes decoding seems like a good idea too :)
12:11karolherbst: yeah, I am all for using shared/common code instead of handwritten decoding stuff
12:11hanetzer: I had an idea, but it seems like it would be quite a lot of work, to do something like serialice on this regard.
12:11karolherbst: ohh I should remove "support large pages" from the list, I implemented that already
12:12karolherbst: but yeah.. I think making it non x86 is a good idea
12:12hanetzer: and correct me if I'm wrong, but it *appears* that, aside from being x86-only, its also pci-only?
12:12karolherbst: especially the large page support already stops at 1G pages and I think how I implemented it, is x86 only as well
12:12karolherbst: hanetzer: mhhhh.. no clue
12:12karolherbst: could be
12:13hanetzer: (the sort of stuff you'd want to use mmiotrace on in arm is decidedly not pci)
12:13karolherbst: I'd start with PCIe cards on arm though then
12:13hanetzer: I mean, to be fair even at the current state mmiotrace is very not performant on account of shutting down all cores but one.
12:14karolherbst: but that stuff is soo old, it was just not very useful back then
12:15karolherbst: not sure how much of this all is an issue and how hard it would be to support tracing with multiple CPU enabled
12:15karolherbst: might need some gross locking
12:15hanetzer: to be frank, I think that in the case of pci blobs, you're more likely to get an x86 blob than an arm one, yeah?
12:15karolherbst: yes, but it's not only useful for blobs
12:15hanetzer: izzat so?
12:16karolherbst: well.. you have a driver bug and have no idea what happens, so you enable mmiotrace on your open source driver and see what the driver is doing
12:16karolherbst: I actually got a bug report from somebody doing exactly that, because I broke the tracer when implementing support for huge pages
12:16hanetzer: ah. to be frank I think tracepoint tech is prolly better in that regard. no faulting.
12:17karolherbst: the good think about mmiotrace is though it dumps the stuff in a "defined" format and you can have tooling around it to decode the "commands"
12:18hanetzer: true enough.
12:18hanetzer: tbqf I'm not sure what a mmiotrace *looks like*, I just understand it conceptually.
12:21hanetzer: theoretically speaking, if I were to start working on kprobe based decoding, could one use qemu to test it?
12:23karolherbst: probably yes
12:23karolherbst: mmiotrace isn't all that magic in the end
12:23karolherbst: it just marks all pages as not there so every page access page faults
12:24hanetzer: the problem is mmu and such is kinda magic on its own :P
12:24karolherbst: I am sure that a modern linux kernel has a lot other good infrastructure to implement it in a better way
12:24karolherbst: so one advantage we have here is, that that "arming" only happens when ioremap is called and only for the pages mapped there
12:25karolherbst: but because memory can be accessed randomly and we have no control over where and how it's accessed, I suspect we still need the MMU to fault on every access
12:32hanetzer: I was thinking, perhaps it may be a good idea to boil out core stuff into like, libmmio.ko and have one for pci and non-pci (because as mentioned, most of the stuff you're gonna mmiotrace on arm is !pci)
12:32hanetzer: eg, it could have been used during the mali/panfrost process :)
12:32hanetzer: (my eventual use case is the blob drivers for hisi chips)
12:37hanetzer: don't suppose you could provide a sample mmiotrace log for reference? :P
13:55karolherbst: hanetzer: uhm.. let me see
15:29karolherbst: tagr: what's an GCC?
15:30karolherbst: ohh HdkR might be able to tell as well
19:22HdkR: karolherbst: What's this then?
19:23karolherbst: no clue.. some part of the GPC I can't figure out what it is :D
19:50HdkR: Ah, I don't really remember details like that anymore :P
23:29karolherbst: HdkR: I suspect it's the GPC constbuffer cache :P
23:30HdkR: Oh, could be
23:31karolherbst: yeah soo... I see the pointer used as a const buffer upload thing.. we set the const buffer inside the QMD and launch that thing, fine.. a few tests later the GPU faults on that address... :(
23:37HdkR: Faults because it tries prefetching and fails or something?
23:38karolherbst: let me disable prefetching
23:39karolherbst: huh.. can one even disable that
23:40karolherbst: yeah.. since ampere
23:43karolherbst: HdkR: the odd thing is.. we always bind to CB1 and we overwrote that binding multiple times since the fault
23:44HdkR: huh, is the faulting address actually in the bound range? Or does it potentially roll off the end?
23:45HdkR: Since it could be trying to fetch past a 4k page, but nothing is at that location
23:45HdkR: Since GPU page is larger
23:46karolherbst: how large?
23:46HdkR: 16k or 64k depending on generation
23:48karolherbst: the kernel totally does ignore my alignment arg
23:48HdkR: Could be an alignment fault as well then
23:48karolherbst: I suspect it's clamped by the size
23:49karolherbst: okay lol
23:49karolherbst: size *= 0x10 and the fault is gone
23:49HdkR: Some things in the pipeline will fetch beyond the declared size sadly
23:49karolherbst: good to know
23:49karolherbst: now I have GPC1/T1_0 faulting
23:49karolherbst: whatever T1_0 is
23:49HdkR: Probably TPC identification
23:50karolherbst: yeah.. something
23:53HdkR: Not sure how the two TPCs are declared in fault land, could be T1_0 and T1_1, or T0_0, or T2_0 :P
23:54HdkR: Oh wait, skipped a level, there are more TPCs than two per GPC
23:59karolherbst: but anyway.. good to know it might be something stupid like this
23:59karolherbst: I suspect for UBOs we might want to make sure to always have something 64k aligned...