00:27 JayFoxRox: how do I decode the Z-buffer [NV2A again]? I have a Z24S8 buffer, it is part of a tiled region, but untiling like the CPU still gives garbage [although, I can make out the outline of some objects]
00:28 JayFoxRox: is Z-buffer even compressed in RAM? how do I decode it? where in pgraph does it store wether Z-buffer is compressed?
00:33 JayFoxRox: it's this one I guess? https://github.com/envytools/envytools/blob/a6a1dc54d59c87ce0a2718415c3ceb2ec2488008/nvhw/comp.c#L677
00:35 mwk: JayFoxRox: yes
00:35 mwk: the information about compression is stored not in pgraph, but in pfb
00:35 mwk: to use compression, you need a tiled region
00:35 mwk: and there's one reg per tired region to enable compression
00:36 mwk: and to select the area of tag ram that corresponds to it
00:36 mwk: the tag ram itself is internal SRAM in the PFB unit
00:36 mwk: it can be accessed through the rdi interface, which you can see in hwtests
00:37 mwk: but that's basically only used for suspend/resume, normally you just let the gpu handle it
00:39 JayFoxRox: yeah, but as I'm basically doing apitrace on hw-level, I also need to access such stuff
00:40 JayFoxRox: I'll try disabling the compression for the tile by poking the PFB reg and seeing if that explodes :P
00:42 JayFoxRox: ewww the GPU did not like that lol. the output is a bit messed up now
00:52 JayFoxRox: mwk: the information about compression is in both, PFB *and* PGRAPH
00:54 JayFoxRox: the 8 words at 0x100300 [PFB] are sort-of mirrored in 0x400980 [PGRAPG]. when I attempted to only disable it in PFB, I suddenly got visual errors which looked like wireframe drawings when the distance was greater
00:54 JayFoxRox: so I believe one is responsible for writes, the other one for reads
00:54 JayFoxRox: *PGRAPH
01:01 JayFoxRox: it seems that PGRAPH also has a mirror of the the other tile info (PFB: 0x240 == PGRAPH: 0x900)
01:02 mooch2: JayFoxRox, does it work now?
01:03 JayFoxRox: mooch2: I can run games without z-buffer compression by disabling it in those 2 regs. yes. - and I assume I can handle tiling the same way [however, just disabling will probably cause issues until the surfaces have been overwritten]
01:04 mooch2: oh nice!
01:04 JayFoxRox: but I still have trouble getting good depth buffer dumps as the tiling is different than for RGB [looking at 640x480 R8G8B8X8 and Z24S8. so both are 32bpp and I expected I could just read R8G8B8X8 and then concatenate R|G|B|X values - but nonono.. I'll push; show a screenshot on discord and then head to bed]
01:09 mooch2: oh weird o.o
01:27 mwk: JayFoxRox: even better, there are actually three copies of that info
01:28 mwk: 1 in PFB, 1 in PGRAPH for output purposes, and 1 in PGRAPH for texturing purposes
01:28 mwk: and the third one is only accessible via RDI
01:29 JayFoxRox: mwk: FML.. disabling zbuffer works reasonably well by just poking PFB and PGRAPH regs. however, disabling tiling that way does not work for me - I probably miss those regs via RDI then
01:30 JayFoxRox: I hate working with RDI tho because you never know if another process sneaks in. I'm running on a dev machine, connected to xbox via network. so there's easily a couple of milliseconds between reads/writes.
01:31 JayFoxRox: mwk: can you give any pointers how to disable tiling? [or, even if unrelated, a link to some of the RDI stuff you were talking about?]
01:42 JayFoxRox: RAM is indeed untiled with my register poking, but image looks bad on display connected to xbox, except for 2D sprites. so I assume this is the texturing you are talking about
02:20 mwk: JayFoxRox: just clear the enable bit in the base reg
02:20 mwk: I don't think I have the RDI stuff described anywhere public, but let's see..
02:20 mwk: this is the PGRAPH RDI
02:22 mwk: space 0xea
02:22 mwk: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/gr/nv20.c#L148
02:23 mwk: these are the regs
02:23 mwk: the RDI copy is for texturing
02:24 mwk: the mmio PGRAPH copy is for drawing to the framebuffer
02:24 mwk: oh, and I have the full layout of 0xea space in my notes
02:25 mwk: 0x00 should be set to the same value as PFB cfg0 [ie. 0x100200], 0x04 to cfg1 [0x100204]
02:26 mwk: 0x08 ... I'm not sure, but I think it should be set to the same value as comp_max_tag, 0x100320
02:26 mwk: 0x0c same as comp_offset, 0x100324
02:26 mwk: 0x10+i*4 is tile addr, ie. 0x100240+i*0x10
02:27 mwk: 0x30+i*4 is tile limit, ie. 0x100244+i*0x10
02:27 mwk: 0x50+i*4 is tile pitch, ie. 0x100248+i*0x10
02:27 nyef: What's "RDI" in this context?
02:28 mwk: and 0x90+i*4 is comp_region, 0x100300+i*4
02:28 mwk: hmm
02:28 nyef: I get this urge to REPZ STOSD when I see it, but that seems a bit unlikely
02:29 mwk: I guess 0x70+i*4 could be tile status, but I don't have this in my notes; oh well
02:29 mwk: nyef: I have no idea what that stands for, but it's basically an address-data pair of registers that allows access to various RAMs and registers inside PGRAPH
02:30 nyef: Hrm... So, similar to the xfer areas for Tesla?
02:30 mwk: well, the PGRAPH RDI; PFB has its own RDI pair as well
02:30 mwk: yeah, except much less crazy
02:31 nyef: Fair enough, I guess.
02:31 mwk: as in, you just poke an address to one reg, and poke/read data through the other reg
02:31 mwk: while accessing xfer areas on Tesla is a whole dance
02:32 nyef: Mmm. Crazy DMA stuff, isn't it?
02:32 mwk: btw, the official name for the xfer areas seems to be the "ramchain"
02:32 mooch2: oh god that's both funny AND terrifying
02:32 mwk: yeah, either DMA, or some complex poking through MMIO
02:33 nyef: And said poking through MMIO is completely unattested in rnndb?
02:33 mwk: it's mentioned
02:33 mwk: addresses 0x400400 and up
02:34 mwk: poke the address to 400408, if writing poke the data to 400420+, hit the control reg, wait for busy flag to clear
02:34 nyef: Ah, the STRAND bits?
02:34 mwk: yes
02:34 nyef: Hmm.
02:35 mwk: that thing is quite unwieldy though
02:35 mwk: as in, you can transfer at most 20 words at a time
02:35 mwk: and have to wait for idle, etc
02:36 nyef: Seems like there'd have to be another mechanism as well...
02:36 mwk: uh?
02:36 mwk: you have two
02:36 mwk: this 400400 area, and the ctxctl microcode
02:36 mwk: not enough?
02:36 nyef: Not enough, given my mental model of the microengine.
02:36 mwk: how so?
02:37 nyef: Or maybe there's more that can be done with STRAND.
02:37 mwk: I mean, that area gives you full read/write access to the ramchains from software
02:37 mwk: and the ctxctl also has full access that it can use for context switching
02:37 mwk: by dma
02:37 mwk: what more could you want?
02:38 nyef: Everything that I've seen that the microengine can do appears to be mirrored in terms of MMIO access, except for XFER/SEEK. And thus I find it hard to believe that you can't kick off an XFER DMA from MMIO.
02:39 nyef: Actually, wait. One more thing that I haven't found: Is the microengine call stack mapped to MMIO space somewhere?
02:39 mwk: nope, the call stack is invisible
02:40 mwk: sorry, but there's good evidence that there is no way of triggering a DMA transfer other than via microcode
02:40 mwk: and FWIW, the same applies to register DMA transfer
02:41 mwk: and the evidence is that the microcode used by nvidia has a branch that can be used by the host to manually request a DMA transfer
02:41 mwk: which presumably wouldn't be needed if the host could just do that on its own
02:42 nyef: Okay, I guess that's fair.
02:42 nyef: Unless it's more a matter of the microengine being able to do it more reliably or something.
02:43 mwk: btw, don't count on consistency
02:43 mwk: this is a complex piece of hardware designed by many people
02:44 mwk: and under time pressure
02:44 nyef: Mmm. And under layers of backwards-compatibility requirements, too.
02:45 mwk: so you'll see lots of things that don't make sense, registers that don't control anything, bits of functionality that was started to be implemented but wasn't finished, etc
02:45 mooch2: mwk, yeah, nvidia hardware even in the nv3 days was a huge clusterfuck tbh :/
02:46 mwk: gods know how many of these UNK123 bits are just connected nowhere at all
02:48 mooch: actually, speaking of nv3, mwk, do you think you could do some hwtests for nv3's pfifo?
02:48 mooch: *please
02:49 mwk: eh.
02:50 nyef: mooch: Do you not have one of your own that you can test with?
02:51 mwk: mooch: you know you have a standing offer of ssh access to my test machines, right?
02:56 mwk: eh.
02:56 mwk: I sort of promised myself not to touch that big rack of gpus until I properly get my current project off the ground
03:05 mooch: oh sorry
03:05 mooch: i kinda forgot the credentials lol
03:05 mooch: also, my systems have gone through some HEAVY ass changes so lol
03:05 mooch: also, i have no IDEA how to reverse engineer anything
03:06 mooch: much less hardware
03:07 nyef: What? As an emulator author, reverse-engineering is a basic skill!
03:09 nyef: There's also a goodly overlap between reverse-engineering and debugging.
03:29 mooch: nyef, i just implement the shit people have already researched, or i just follow the ocs :/
03:29 mooch: *docs
03:31 nyef: The number of times I've had to deal with no documentation, incomplete documentation, or inaccurate documentation...
03:33 nyef: You have an emulator, you have driver software. The driver software is presumed to be "good", and the emulator is buggy or incomplete. You're in good shape: You can instrument the emulator to see what the driver is doing, and then try to replicate the effects with real hardware to see what it does.
03:38 nyef: Right now one of the projects I'm working on involves an 8051. By trying likely-looking but undocumented things, we got a partial program dump. From there, we found program bankswitch controls, some amount of writable program memory, and a way to do code injection to get a more-complete program dump.
03:39 nyef: Do your research, try things, make notes as to what happens, come up with models for how the thing behaves, see what ideas that gives you for further things to try.
03:43 nyef: Overall, reverse-engineering is a good skillset to have. Also helps with security stuff. Actually, a goodly amount of computer security research IS reverse-engineering.
03:52 mooch: yeah, but i can't even read ASM
03:52 mooch: like, not even 6502 asm
03:52 mooch: i can't write it either
03:53 nyef: You don't need ASM to work with an mmiotrace, which is more or less what you'd be trying to get from your emulator anyway.
03:54 nyef: Or you could LEARN some ASM.
03:54 nyef: It's just another programming language... or, well, a whole bunch of programming languages, but rarely more than one or two per CPU family.
03:55 nyef: (Modern x86 types being one of the rare cases of having at least THREE incompatible instruction sets on the same CPU.)
03:56 nyef: You could even come at it backwards: Grab the documentation, write a CPU simulator.
03:59 nyef: (The Z80 is one of the better CPUs to try that stunt with, btw. Good test suite, good documentation, and it's used in quite a few systems.)
04:08 nyef: The 6502 might work for that stunt as well, actually, as long as you don't need any of the undocumented opcodes (which can actually vary in behavior from system to system due to bus loading and signal drive strength and other effects).
04:10 mooch: nyef, i can already write cpu simulators
04:10 mooch: i've done it before
04:18 nyef: Then being able to write code to run on them should be well within your capabilities. Reading it can be a bit harder if you're working with disassembly, of course, but still doable, especially if you can step through it one instruction at a time.
04:26 mooch: i can't. 86box never had a debugger
04:26 mooch: besides, i'm no longer part of that project anyway, i'm going to adapt my nvidia knowledge into a different emulator
04:45 nyef: ... You can't load a cracked copy of softice into that thing and run it on an MDA card?
09:18 JayFoxRox: mwk: thanks! this RDI stuff works like a charm. I can't express how helpful you are :)
11:51 karolherbst: mwk: did you ever looked at what the CHIL PWM does in your nvd9?
11:54 karolherbst: it seems to be connected through SMBUS
11:55 karolherbst: I doubt it is used for volting, as there are 4 VID GPIOs and a sane table with the combinations
11:57 karolherbst: ohhh, it is there for DDR memory :P
11:57 karolherbst: :O
11:58 karolherbst: ohh wow, crap
12:01 karolherbst: I uhm... ignore that for now
12:08 karolherbst: mupuf: this is interesting
12:09 karolherbst: https://gist.github.com/karolherbst/f164bfb51641a0f27d3af95c1d641998
12:09 karolherbst: both parsings are kind of correct, but also totally wrong
12:10 karolherbst: yes, we have 3 GPIOs
12:10 karolherbst: and yes, those combinations might set a proper voltage
12:10 karolherbst: but we have this I2C CHIL PWM
12:10 karolherbst: _maybe_ we can just get away by setting those GPIOs and the PWM does the magic
12:11 karolherbst: but then we need a different logic on how to device whether we do GPIO or PWM based volting
12:12 karolherbst: I mean, the GPIO parsing just depends on other tables and makes something usefull by accident
12:12 karolherbst: sure
12:12 karolherbst: there are a few weird things in the PWM parsing though
12:12 karolherbst: "frequency 1 kHz" and "base voltage 306250 µV (unk = 16)"
12:12 karolherbst: and the range is also stupid
12:16 karolherbst: mhhh
12:16 karolherbst: the header is longer
12:16 karolherbst: c0 00 with the CHIL one
12:27 karolherbst: mhh wait, it is longer in the "normal" PWM case
12:30 karolherbst: duh
12:30 karolherbst: the GPIO one is the shorter one....
12:30 karolherbst: mhh, not always though
12:30 karolherbst: ohh right, int he CHIL case
12:39 karolherbst: mupuf: -- Mode GPIO (header-generated), Base voltage 306250 µV, voltage step 6250 µV, acceptable range [712500, 1150000] µV --
12:39 karolherbst: so... that makes sense
12:40 karolherbst: or at least mroe sense
14:10 karolherbst: mupuf: sooo, I think this makes sense: https://gist.githubusercontent.com/karolherbst/4f910f9f5fa9e0600ccd2ea88e3dcebe/raw/05b145f42fbc9481948b3c6074ede1eabf19b6ca/gistfile1.txt
14:11 karolherbst: the frequency can be up to 1.2MHz though, so I htink this unknown field is just 8 bit big
14:11 karolherbst: we also have it in the normal PWM, but tight to the base voltage
14:11 karolherbst: mhhhh
14:12 karolherbst: mupuf: it might be even the same PWM used on the other Kepler cards
14:12 karolherbst: just not wired up through I2C
14:12 karolherbst: or maybe it is, just hidden
14:14 mwk: karolherbst: tbh I don't know what a CHIL PWM is
14:15 karolherbst: mwk: an extdev on the GPU
14:15 karolherbst: mwk: EXTDEV 2: type 0x48 [CHIL_SMBUS_8112A/8112B/8225/8228] at 0x60 defbus 0
14:15 mwk: ah
14:16 karolherbst: it seems to be used on very few KEpler GPUs to control the shader voltage
14:16 karolherbst: one in the entire repository
14:16 karolherbst: but this is a I2C one
14:16 karolherbst: also "Up to 3 VID select lines for dynamic voltage transitions"
14:16 karolherbst: and this one Kepler has only 3 VID
14:16 karolherbst: which is quite insane
14:17 karolherbst: as you got the full volt map table there
14:17 karolherbst: well
14:17 karolherbst: on your nvd9 it seems to be used for something else
14:17 karolherbst: maybe the memory voltage, dunno
14:17 karolherbst: maybe just for reading back the set voltage?
14:20 karolherbst: mhhh, but that 306250 value can be divided by 6250
14:20 karolherbst: ...
14:20 karolherbst: soo, maybe it is the base voltage of the PWM, but we really should only set a value inside [712500, 1150000]
14:24 karolherbst: mhhh, we have an mmiotrace
14:46 karolherbst: nice
14:46 karolherbst: https://www.digchip.com/datasheets/download_datasheet.php?id=3747065&part-number=CHL8212-XXCRT
14:55 karolherbst: I don't see a I2C register overview though