11:55 hanetzer: ello folks. researching mmiotrace a bit, and ya'll seem to be the primary (only?) consumer at the moment. end goal is mmiotrace for !x86. the page https://nouveau.freedesktop.org/MmioTraceDeveloper.html suggests using emulation instead of page faulting, anyone have an overview of how that would work? I can't see emulating all load/store opcodes as being particularly performant
12:09 karolherbst: hanetzer: yeah.. it's not, but it would help with a few issues
12:09 karolherbst: atm repeat instructions aren't supported
12:09 karolherbst: so if you got an x86 rep mov...
12:10 hanetzer: karolherbst: as in, the current state of mmiotrace can't deal with it?
12:10 karolherbst: yep
12:10 karolherbst: emulation would have the benefit of being perfect in tracking what happens, but as you said, also quite slow
12:10 karolherbst: dunno.. maybe we can deal with reps in another way
12:10 hanetzer: hrm. well, that said, using the kprobes decoding seems like a good idea too :)
12:11 karolherbst: yeah, I am all for using shared/common code instead of handwritten decoding stuff
12:11 hanetzer: I had an idea, but it seems like it would be quite a lot of work, to do something like serialice on this regard.
12:11 karolherbst: ohh I should remove "support large pages" from the list, I implemented that already
12:12 karolherbst: but yeah.. I think making it non x86 is a good idea
12:12 hanetzer: and correct me if I'm wrong, but it *appears* that, aside from being x86-only, its also pci-only?
12:12 karolherbst: especially the large page support already stops at 1G pages and I think how I implemented it, is x86 only as well
12:12 karolherbst: hanetzer: mhhhh.. no clue
12:12 karolherbst: could be
12:13 hanetzer: (the sort of stuff you'd want to use mmiotrace on in arm is decidedly not pci)
12:13 karolherbst: I'd start with PCIe cards on arm though then
12:13 hanetzer: I mean, to be fair even at the current state mmiotrace is very not performant on account of shutting down all cores but one.
12:13 karolherbst: yeah...
12:14 karolherbst: but that stuff is soo old, it was just not very useful back then
12:15 karolherbst: not sure how much of this all is an issue and how hard it would be to support tracing with multiple CPU enabled
12:15 karolherbst: might need some gross locking
12:15 hanetzer: to be frank, I think that in the case of pci blobs, you're more likely to get an x86 blob than an arm one, yeah?
12:15 karolherbst: yes, but it's not only useful for blobs
12:15 hanetzer: izzat so?
12:16 karolherbst: well.. you have a driver bug and have no idea what happens, so you enable mmiotrace on your open source driver and see what the driver is doing
12:16 karolherbst: I actually got a bug report from somebody doing exactly that, because I broke the tracer when implementing support for huge pages
12:16 hanetzer: ah. to be frank I think tracepoint tech is prolly better in that regard. no faulting.
12:17 karolherbst: maybe
12:17 karolherbst: the good think about mmiotrace is though it dumps the stuff in a "defined" format and you can have tooling around it to decode the "commands"
12:18 hanetzer: true enough.
12:18 hanetzer: tbqf I'm not sure what a mmiotrace *looks like*, I just understand it conceptually.
12:21 hanetzer: theoretically speaking, if I were to start working on kprobe based decoding, could one use qemu to test it?
12:23 karolherbst: probably yes
12:23 karolherbst: mmiotrace isn't all that magic in the end
12:23 karolherbst: it just marks all pages as not there so every page access page faults
12:23 hanetzer: yeh.
12:24 hanetzer: the problem is mmu and such is kinda magic on its own :P
12:24 karolherbst: I am sure that a modern linux kernel has a lot other good infrastructure to implement it in a better way
12:24 karolherbst: so one advantage we have here is, that that "arming" only happens when ioremap is called and only for the pages mapped there
12:25 karolherbst: but because memory can be accessed randomly and we have no control over where and how it's accessed, I suspect we still need the MMU to fault on every access
12:32 hanetzer: I was thinking, perhaps it may be a good idea to boil out core stuff into like, libmmio.ko and have one for pci and non-pci (because as mentioned, most of the stuff you're gonna mmiotrace on arm is !pci)
12:32 hanetzer: eg, it could have been used during the mali/panfrost process :)
12:32 hanetzer: (my eventual use case is the blob drivers for hisi chips)
12:37 hanetzer: don't suppose you could provide a sample mmiotrace log for reference? :P
13:55 karolherbst: hanetzer: uhm.. let me see
15:29 karolherbst: tagr: what's an GCC?
15:30 karolherbst: ohh HdkR might be able to tell as well
19:22 HdkR: karolherbst: What's this then?
19:23 karolherbst: no clue.. some part of the GPC I can't figure out what it is :D
19:50 HdkR: Ah, I don't really remember details like that anymore :P
23:29 karolherbst: HdkR: I suspect it's the GPC constbuffer cache :P
23:30 HdkR: Oh, could be
23:31 karolherbst: yeah soo... I see the pointer used as a const buffer upload thing.. we set the const buffer inside the QMD and launch that thing, fine.. a few tests later the GPU faults on that address... :(
23:37 HdkR: Faults because it tries prefetching and fails or something?
23:38 karolherbst: maybe?
23:38 karolherbst: let me disable prefetching
23:39 karolherbst: huh.. can one even disable that
23:40 karolherbst: yeah.. since ampere
23:43 karolherbst: HdkR: the odd thing is.. we always bind to CB1 and we overwrote that binding multiple times since the fault
23:44 HdkR: huh, is the faulting address actually in the bound range? Or does it potentially roll off the end?
23:45 karolherbst: dunno
23:45 HdkR: Since it could be trying to fetch past a 4k page, but nothing is at that location
23:45 HdkR: Since GPU page is larger
23:46 karolherbst: mhhhh
23:46 karolherbst: how large?
23:46 HdkR: 16k or 64k depending on generation
23:48 karolherbst: the kernel totally does ignore my alignment arg
23:48 HdkR: Could be an alignment fault as well then
23:48 karolherbst: I suspect it's clamped by the size
23:49 karolherbst: okay lol
23:49 karolherbst: size *= 0x10 and the fault is gone
23:49 HdkR: Some things in the pipeline will fetch beyond the declared size sadly
23:49 karolherbst: okay
23:49 karolherbst: good to know
23:49 karolherbst: now I have GPC1/T1_0 faulting
23:49 karolherbst: whatever T1_0 is
23:49 HdkR: Probably TPC identification
23:50 karolherbst: yeah.. something
23:53 HdkR: Not sure how the two TPCs are declared in fault land, could be T1_0 and T1_1, or T0_0, or T2_0 :P
23:54 HdkR: Oh wait, skipped a level, there are more TPCs than two per GPC
23:59 karolherbst: but anyway.. good to know it might be something stupid like this
23:59 karolherbst: I suspect for UBOs we might want to make sure to always have something 64k aligned...