00:01 karolherbst: mhhh
00:01 karolherbst: maybe your vbios comes from somewhere else... weird
00:03 karolherbst: mhh PCIROM.. not sure if nvagetbios can even read that
00:03 karolherbst: but that there is no PROM is _really_ odd
00:05 karolherbst: it's probably the rom file of the sysfs device
00:05 karolherbst: PWD
00:05 karolherbst: johns: "/sys/bus/pci/devices/0000:01:00.0/rom"
00:05 karolherbst: you could run that file through "nvbios" and see if that parses alright
01:06 johns: karolherbst: okay interesting, I'll try that tomorrow
13:01 johns: I should've mentioned, this is also a machine with coreboot installed; could very well be a coreboot issue
13:01 karolherbst: johns: ohhh, good point
13:02 karolherbst: maybe the size of the rom is wrongly advertized or something...
13:09 hell__: did anyone say coreboot?
13:10 hell__: johns: which payload are you using?
13:13 karolherbst: hell__: do you know how much testing the PCI ROM part gets, because that's something we rarely ever have to use on GPUs
13:13 karolherbst: johns: anyway.. a dump of the PCI ROM file would be interesting to look at
13:14 johns: hell__: seabios
13:14 hell__: johns: right, make sure coreboot is *not* running the VBIOS (SeaBIOS will do it)
13:15 karolherbst: hell__: the issue is that nouveau gets a missmatched signature of the VBIOS
13:17 hell__: karolherbst: PCI Option ROM support is mainly tested with AMD graphics cards. the thing is, coreboot can run the VBIOS, but the payload can also run the VBIOS
13:17 karolherbst: sure, but that's not the issue here
13:18 karolherbst: nouveau validates that the VBIOS is in order
13:18 karolherbst: to do that, we calculate a checksum over its size
13:18 hell__: right
13:19 karolherbst: maybe we get garbage at the end instead of 0 or something like that...
13:19 karolherbst: anyway
13:19 karolherbst: would be interesting to look at the ROM file
13:19 karolherbst: the VBIOS itself contains the checksum, so I doubt that this value is wrong
13:20 hell__: > Expansion ROM at 000c0000 [disabled] [size=128K]
13:20 hell__: hmmmmm
13:21 karolherbst: normally there are other ways to retrieve the vbios from the GPU but seems like on this system PCI ROM is the only available method
13:21 hell__: on plug-in cards, the VBIOS has to be on the card itself
13:21 johns: hell__: so, that line is also in the info from a friend's system, with a slightly different card, that's working: https://paste.debian.net/1247854/
13:23 karolherbst: johns: question is, does it use the PCI ROM or a different way of fetching the VBIOS
13:23 karolherbst: hell__: sure, but the PCI ROM also comes from the card itself
13:24 karolherbst: johns: how painful would it be to try stock firmware?
13:24 karolherbst: but from what I can tell it looks more like a coreboot bug, but can't be 100% sure atm
13:26 johns: karolherbst: very painful unfortunately
13:26 hell__: so coreboot has a Kconfig option to decide whether to load option ROMs stored on plug-in cards, this is disabled by default on SeaBIOS (because SeaBIOS will do it anyway)
13:27 karolherbst: hell__: okay.. question is just if the option ROM is accessible from the OS
13:28 johns: (also a little relevant, a different nouveau driven card has been working in the machine, I don't have the model number in front of me yet though)
13:28 karolherbst: yeah.. I suspect the other GPUs all use the PROM "method" of loading the vbios
13:28 karolherbst: just that GPU is special
13:28 karolherbst: which is more or less the default way
13:29 hell__: are the PROM and other methods described anywhere?
13:29 karolherbst: not really, but those are Nvidia GPU specific things
13:29 karolherbst: those just live inside the MMIO space
13:30 karolherbst: PROM is at 0x300000 I think
13:30 karolherbst: whatever bar that was
13:30 hell__: coreboot uses the PCI method then
13:30 karolherbst: I have no idea how the firmware GPU interactions work there tbh
13:31 hell__: read the PCI_ROM_ADDRESS (0x30) register from the GPU's PCI config space to get the address
13:31 karolherbst: maybe the PCI ROM thing is always accessible, who knows, but we don't use it as the "default" one because of its limitations I think
13:31 karolherbst: not sure
13:32 johns: hmm.. what if the pcirom error is a red herring, and that the real issue is that the method that should've been chosen earlier wasn't?
13:32 karolherbst: johns: that's why I was asking for the ROM file :P
13:32 johns: right
13:32 hell__: karolherbst: limitations?
13:32 karolherbst: but if the GPU says there is no data, then there is no data
13:32 karolherbst: hell__: size mostly I think
13:32 hell__: ah
13:32 karolherbst: newer VBIOS are like multiple MBs
13:33 johns: later today I should be able to try the extraction, but I'm a little skeptical that'll work given the hexdump of the file
13:33 karolherbst: johns: you could do a nvapeek 0x300000
13:33 karolherbst: (part of envytools)
13:34 karolherbst: johns: of which file? /rom ?
13:34 johns: karolherbst: of what nvagetbios gave me
13:34 karolherbst: yeah, that's using prom
13:35 karolherbst: prom needs to contain 0aa55 at location 0x0, which nvagetbios says wasn't there
13:35 karolherbst: so I suspect it's all 0
13:35 karolherbst: nouveau also calculated 0 over the entire prom, so...
13:35 karolherbst: and the signature is just |ing all the values
13:35 hell__: for me, it'd be useful to get a coreboot log, coreboot config and know which machine this is
13:35 karolherbst: uhm
13:35 karolherbst: actually adding
13:36 johns: hell__: I can partly answer at least one of those questions without being in front of it, it's a kcma-D8
13:36 hell__: uh-oh
13:36 hell__: that's ancient coreboot
13:37 johns: yes. the card is also ancient though, maybe that helps? :)
13:37 hell__: well, the codebase for the KCMA-D8 and KGPE-D16 was horrible
13:39 johns: karolherbst: you think that nvbios on /rom might work even though cat on /rom doesn't?
13:40 hell__: ah, I think I remember something now...
13:40 johns: yesterday when I tried echo 1 > /rom and then cat /rom, I got the same signature error
13:40 hell__: `MAINBOARD_FORCE_NATIVE_VGA_INIT` is selected on these boards, so coreboot is told to initialize the Aspeed BMC without using any option ROM
13:41 karolherbst: johns: huh....
13:42 karolherbst: johns: yeah... dunno... would have to look at whatever that file contains
13:42 karolherbst: but if the firmware doesn't give us the ROM...
13:42 hell__: johns: do you get any video before loading an OS? if so, is it from the onboard video or from the Nvidia card?
13:42 johns: hell__: no video at all
13:42 johns: with this card
13:42 hell__: any video from the onboard VGA port?
13:43 johns: I can check that.. with the previous card, I had video from boot before OS
13:43 hell__: hrm
13:44 hell__: I suspect coreboot initializes the onboard VGA and this messes with the Nvidia card's option ROM
13:47 johns: interesting
13:47 hell__: what exactly is the graphics card that's giving you issues?
13:48 hell__: looking up GK104GLM says it's a Quadro K3100M, but when I look for info about it I only find MXM cards
13:48 johns: hell__: GV-N670OC-4GD Gigabyte GeForce GTX 670 4GB GDDR5
13:50 hell__: hrm.
13:51 hell__: I think I'll need a coreboot log: build util/cbmem and run it as `sudo cbmem -C`
13:51 hell__: https://github.com/coreboot/coreboot/tree/master/util/cbmem
13:53 johns: hell__: thanks, I will do that
13:58 karolherbst: hell__: why would coreboot mess with the option rom though?
13:59 hell__: I'm not sure yet
14:00 hell__: karolherbst: where does nouveau get the option ROM from, if not via PROM? mind pointing me at the code please?
14:03 karolherbst: hell__: via pci_map_rom
14:03 karolherbst: no idea what linux is doing there
14:04 hell__: this? https://github.com/torvalds/linux/blob/master/drivers/pci/rom.c#L136
14:08 hell__: ok, so pci_enable_rom enables memory-mapping the ROM using the same `PCI_ROM_ADDRESS` coreboot uses
14:08 hell__: but pci_enable_rom skips all of this if the IORESOURCE_ROM_SHADOW flag is set, i.e. the ROM has been shadowed (copied) to RAM already
14:09 karolherbst: hell__: I guess so
14:10 hell__: https://github.com/torvalds/linux/blob/master/arch/x86/pci/fixup.c#L311
14:10 johns: fwiw, I did also test booting with no driver (blacklisted nouveau), and still couldn't cat /rom
14:10 hell__: and this flag can be set in some cases
14:12 hell__: when this flag is set, the VBIOS is assumed to be at 0xc0000
14:12 hell__: this is needed on systems where the VBIOS is supplied by the boot firmware, e.g. laptops with onboard GPU, where the VBIOS is not on a separate chip
14:13 hell__: johns: got a full dmesg?
14:13 karolherbst: yeah... but there we usually just read it out via ACPI
14:13 hell__: how does this ACPI mechanism work?
14:14 karolherbst: via an ACPI call
14:14 karolherbst: it's all in drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowacpi.c
14:15 hell__: I know AMD/ATI uses a dedicated ACPI table known as VFCT, which coreboot fills in (only for AMD/ATI cards)
14:15 karolherbst: apparently for nvidia it's _ROM on the device node
14:16 hell__: just saw, and that's described in the ACPI spec: https://imgur.com/Wnkix1c.png
14:18 hell__: I don't think coreboot does this
14:19 karolherbst: yeah... sounds like it should :P
14:19 hell__: actually, it does
14:20 karolherbst: anyway, only relevant for laptops
14:21 karolherbst: don't see why using pci_map_rom shouldn't work, but... maybe there is also a more proper way of doing this especially if the firmware is able to modify the data and make any signature invalid
14:21 hell__: and it only does so if it can figure out the ACPI device path for the corresponding PCI device
14:21 hell__: there are some VBIOSes which self-modify
14:21 karolherbst: pleasent
14:22 hell__: or get patched at runtime by vendor firmware, I'm not sure
14:22 karolherbst: anyway, another reason to actually get the ROM and look at it :)
14:22 johns: are there any other ways to get the rom? nvflash also failed..
14:22 johns: (yesterday)
14:22 hell__: my current theory is that Linux decides that the data at 0xc0000 is a valid VBIOS even though it isn't
14:23 karolherbst: mhhh
14:23 hell__: it's easy to see with a dmesg log if Linux is using the data at 0xc0000
14:23 hell__: > [ 0.227624] pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
14:24 hell__: trying to figure out what's going on without logs is like trying to squeeze water out of a stone
14:25 karolherbst: eyah...
14:28 hell__: the signature check that fails is checking that a value corresponds to "PCIR"
14:31 hell__: hmmmm, 0xe938aa55
14:31 hell__: https://en.wikipedia.org/wiki/Option_ROM
14:31 hell__: > The first two bytes of the ROM must be 55 AA. The third byte indicates the ROM size, and the fourth byte is where the BIOS begins execution of the option ROM to initialize it before the system boots. Often this initialization is done by a 3 byte jump instruction starting with hexadecimal value E9.
14:32 karolherbst: yep
14:32 karolherbst: on my laptop it's 0xeb78aa55
14:32 hell__: IIRC eb is another jump
14:32 karolherbst: yeah
14:32 karolherbst: on modern GPUs you have multipart VBIOS
14:33 karolherbst: ahh eb is relative jump btw
14:34 karolherbst: uhm.. with an 8 bit value
14:34 karolherbst: instead of 16/32
14:34 hell__: https://c9x.me/x86/html/file_module_x86_id_147.html
14:34 hell__: yep
14:41 johns: I'll come back with the full dmesg and coreboot logs later on.. should've just saved the full dmesg instead of just the nouveau parts, but back then I thought this was going to be easy ;)
14:42 hell__: anyone here with a Nvidia Optimus system, or some multiple-graphics-card setup? I'd like a `lspci -v` log
14:45 karolherbst: hell__: all devices or just the GPU?
14:45 ajax: hell__: https://paste.centos.org/view/raw/561b8ebf
14:45 hell__: all
14:45 ajax: ^ coffeelake + rx480
14:45 ajax: not nvidia, i admit
14:46 karolherbst: hell__: https://gist.githubusercontent.com/karolherbst/e766e6e3af565f331324ceb944a79090/raw/2ff737da12ae890e03c09f4681873318dd48c8e7/gistfile1.txt
14:46 hell__: ajax: hm, the Intel iGPU seems to not use an Expansion ROM, which is weird
14:47 hell__: karolherbst: thanks, that log is exactly what I needed
14:47 ajax: hell__: this is a workstation machine, and the radeon is set to be the boot primary in the bios
14:47 hell__: makes sense, I wonder what the Intel iGPU is doing then
14:47 ajax: on-mobo intel machines typically do not expose their rom through the pci "rom" bar
14:47 hell__: d'oh
14:47 karolherbst: drivers have to POST non primary GPUs themselves
14:48 ajax: they just stick it at 0xc0000 as a platform detail
14:48 hell__: that was a brain fart from me
14:48 ajax: though i think e820 and friends will tell you about it
14:48 karolherbst: ohh
14:48 hell__: but yes, it makes sense that the Intel iGPU's VBIOS is not visible though the PCI ROM BAR because it doesn't have a dedicated chip, it must be copied to RAM
14:49 ajax: i assume intel dg2 and such have actual rom resources in the pci device but i do not know for sure
14:49 hell__: any plug-in card must have a flash chip with the VBIOS
14:50 hell__: "desktop" PCIe cards do, and so do MXM cards (if they don't, good luck running them on a different system)
14:50 karolherbst: do people actually ever replace MXM cards?
14:51 hell__: idk, but the cards I've seen have flash chips
14:51 karolherbst: always sounded like the only reliable way of doing is to use a different MXM card for the same laptop, but different config
14:51 karolherbst: but yeah.. they probably do have flash chips
14:53 hell__: in the Optimus log, the Intel VBIOS is shadowed at 0xc0000 because it's the primary video adapter (the one used for boot video), and the Nvidia VBIOS is at 0xb4000000, and it's not shadowed
14:54 hell__: I know it's not shadowed because the PCIe root port (PCI bridge) for the GPU is decoding the b3000000-b40fffff [size=17M] memory range
14:54 karolherbst: what makes me curious is why the nvidia vbios is only 128k for johns
14:55 karolherbst: mhh although such small vbios actually do exist for this era of GPUs
14:55 hell__: because that's the available space between 0xc0000 and 0xdffff
14:56 karolherbst: mhhh, right
14:56 hell__: that's the "ISA Expansion Area"
14:57 hell__: the 1st MiB of memory on x86 systems is extremely legacy
14:58 karolherbst: so I've heard
14:58 hell__: like, DOS era legacy
14:58 hell__: https://imgur.com/TtIuR6C.png
14:58 hell__: the 640 KiB block at the bottom is what DOS would use
15:01 hell__: the 128 KiB block above is the A-segment, and it can be used as "legacy video area" (for a framebuffer) or SMRAM (the RAM used by SMM, System Management Mode). SMRAM is now located elsewhere (IIRC, using the A-segment for SMRAM is only possible with 4 CPU cores or less)
15:03 hell__: actually, the underlying RAM would be used as SMRAM when the CPU accesses the segment in SMM mode, but when outside SMM the accesses are routed to the GPU's framebuffer
15:03 hell__: so it's both at the same time
15:04 hell__: the 128 KiB block above the A-segment is the C-segment, and it's used to store Option ROMs
15:06 hell__: and the two 64 KiB blocks above are the E-segment and F-segment, used for the BIOS
15:06 karolherbst: I wouldn't be surprised if the VBIOS is actually bigger than 128k, but coreboot or seabios are just fetching the first 128k bytes and make them available via 0xc0000
15:07 karolherbst: and the kernel picks it up
15:07 karolherbst: but still not sure why that would affect nouveau
15:08 karolherbst: unless the size of the PCI resource is wrong
15:08 hell__: from the lspci log, looks like the PCIROM method would read at 0xc0000: https://paste.debian.net/1247853/
15:08 karolherbst: ahh wait...
15:08 karolherbst: pci_get_rom_size "fixes" the size
15:09 karolherbst: pci_get_rom_size is actually quite interesting
15:10 hell__: looks like it's making use of the information available in the PCI Firmware Specification
15:10 karolherbst: yeah.. and the size would be capped at 128k
15:10 karolherbst: ehh wait
15:10 karolherbst: shouldn't
15:12 hell__: image length is in units of 512 bytes
15:12 karolherbst: mhhh
15:13 karolherbst: not sure how the ROM resource is actually advertized, but it looks like that the pci_* code could read from 0xc0000 if that's what the kernel sets up as the ROM thing
15:14 hell__: yes, that's what I was saying earlier
15:14 karolherbst: okay.. so if the vbios is indeed bigger, then we are in trouble
15:15 karolherbst: and there are 160kb big VBIOS for desktop cards for this generation of GPUs
15:15 hell__: if you're interested in understanding what pci_get_rom_size accesses, this is the PCI Firmware Specification
15:16 hell__: https://drive.google.com/file/d/1lnC6I-Yv4tQ2uc2gT5fqTbOhF36fDvE2/view
15:16 karolherbst: yeah.. I am mildly familiar with that PCIR stuff
15:17 hell__: I wonder if the VBIOS is designed to be partially loaded or something
15:17 karolherbst: yeah...
15:17 karolherbst: you usually have multiple parts on newer GPUs
15:17 karolherbst: but not on kepler
15:17 karolherbst: at least not that I am aware of
15:18 hell__: I have a GTX 1050 Ti here, what's the size of the VBIOS?
15:18 hell__: or how do I get it
15:18 karolherbst: it's variable
15:18 karolherbst: but I suspect you can read it out via prom
15:19 hell__: is nvagetbios part of envytools?
15:19 karolherbst: "nvagetbios -s prom" from https://github.com/envytools/envytools should do the trick
15:19 karolherbst: yeah
15:19 karolherbst: you should have this multipart vbios stuff
15:20 karolherbst: the first part is some valid data, then you have some blob and the third part is the rest of the vbios
15:20 hell__: so lspci thinks the expansion rom is shadowed and 128K in size
15:21 hell__: > Card has second bios
15:22 hell__: lolwat, it's 1 MiB
15:23 karolherbst: yeah
15:23 karolherbst: it's small :)
15:23 hell__: oh, there's a bunch of FFs
15:23 johns: yeah
15:23 karolherbst: although I think only the first ~300kb should be valid data
15:24 karolherbst: maybe more
15:24 hell__: I see valid strings at around 0x84f00
15:25 hell__: BIOS Certificate Check Failed!!!
15:25 karolherbst: :P
15:25 karolherbst: I think that's the second part or so
15:25 karolherbst: uhm.. wait
15:25 hell__: I think there are two copies
15:25 karolherbst: no, 0x84f00 is way too late
15:25 karolherbst: yeah
15:26 karolherbst: could be
15:26 hell__: same strings at around 0x4f00
15:26 karolherbst: that sounds more like it
15:26 karolherbst: around 0x2f000 is the start of the second part
15:27 karolherbst: where you see bunch of random data
15:27 karolherbst: or was it 0x1f000?
15:27 hell__: https://drive.google.com/file/d/1TzZV6X--UNvYm4ShKV57gwzRFRcslaBB/view?usp=sharing
15:28 karolherbst: right.. random stuff starts at 0xf000 for me
15:28 hell__: card is a Gigabyte gv-n105twf2oc-4gd
15:28 karolherbst: and the third part starts at 0x1fe00
15:28 karolherbst: let's check yours
15:29 karolherbst: yeah.. same for you
15:29 hell__: I see another 55aa at 0xf000
15:29 karolherbst: that's part 2
15:30 hell__: what's the strap peek?
15:30 karolherbst: some nvidia specific value
15:30 karolherbst: encodes some information relevant for parsing the vbios
15:30 karolherbst: like VRAM info
15:30 hell__: 0x400080
15:31 karolherbst: doesn't matter for most parts (99.9%) :)
15:31 hell__: just provided it for the sake of completeness
15:31 karolherbst: but yeah.. you really have more data at roughly 50%
15:31 karolherbst: I don't
15:32 karolherbst: interesting
15:32 karolherbst: maybe I should diff both halfs
15:32 karolherbst: maybe your VBIOS got updated by the nvidia driver?
15:32 karolherbst: who knows
15:33 hell__: ah, could be
15:33 hell__: currently running the nvidia driver
15:36 karolherbst: heh
15:36 karolherbst: both halfs are identical
15:37 karolherbst: maybe a security copy then
15:39 karolherbst: hell__: anyway.. your third part starts at 0x2d300
15:40 karolherbst: the first and third part are actually relevant for the nouveau driver
15:40 karolherbst: and as you can tell, that won't fit the PCI ROM window
15:40 karolherbst: well.. if it's at 0xc0000
15:41 hell__: oh, I know what the thing at 0xf000 is
15:41 hell__: https://imgur.com/c5FYtmY.png
15:41 hell__: so the option ROM at 0x0 is type 0 (Intel x86, PC-AT compatible)
15:41 hell__: and the option ROM at 0xf000 is type 3 (EFI)
15:42 karolherbst: yeah.. somebody mentioned here before it's some kind of blob for the firmware
15:42 hell__: I think it's just the EFI GOP driver
15:42 karolherbst: probably
15:43 karolherbst: just that you don't have it on all UEFI capable GPUs
15:43 karolherbst: I think
15:43 karolherbst: or maybe you have...
15:43 karolherbst: the third part is what's newish
15:44 hell__: the 0xf000 option ROM lacks the Configuration Utility Code Header and the DMTF CLP entry point (pointers are 0)
15:44 hell__: these two things sound extremely legacy
15:44 karolherbst: yeah...
15:44 karolherbst: I just checked.. other kepler vbios have it as well
15:44 karolherbst: fun.. there is even a certificate
15:44 hell__: do these VBIOSes support UEFI?
15:44 hell__: ah, yes, the GOP driver is probably signed
15:44 karolherbst: yeah, they do
15:44 hell__: most likely because Secure Boot
15:45 karolherbst: yeah.. makes sense
15:45 karolherbst: soo mhh
15:45 karolherbst: that makes me wonder
15:46 karolherbst: on the random vbios I picked it starts at 0xf400
15:46 karolherbst: let's check that 1MB file here
15:47 hell__: the hex number after the 55AA should be the size in 512-byte blocks
15:47 hell__: er, not really
15:47 karolherbst: heh.. that GPU only has 512kB of prom
15:48 hell__: it's 16 bytes after the following sequence of bytes: 50 43 49 52 de 10
15:48 karolherbst: and if also has a copy of the VBIOS
15:48 hell__: that's just "PCIR" followed by Nvidia's PCI vendor ID, 0x10de
15:48 karolherbst: yeah
15:49 hell__: 16 bytes after the start of the sequence is the image length field, 2 bytes
15:49 karolherbst: mhhh
15:49 karolherbst: odd
15:49 hell__: so, 0x0078 for my legacy VBIOS, 0x0084 for the GOP VBIOS
15:49 karolherbst: although I am sure the GOP vbios just parses the legacy one
15:50 hell__: I wouldn't be surprised if they're just two copies of the same thing
15:50 hell__: although the GOP driver probably needs more code to handle some stuff
15:50 karolherbst: mhh maybe
15:50 karolherbst: yeah
15:50 karolherbst: the GOP one doesn't seem to contain any of the data tables
15:51 karolherbst: it's really just random code I think
15:51 karolherbst: well.. + certs
15:51 hell__: I think GOP drivers need to be able to set up linear framebuffers
15:51 karolherbst: right
15:51 karolherbst: and it has to contain the vbios parser as well :D
15:51 hell__: not just text-mode and certain modes
15:51 hell__: VBIOS parser?
15:51 karolherbst: yeah
15:52 hell__: what does it need to parse?
15:52 karolherbst: so there are tables to set up VRAM and shit
15:52 karolherbst: for POST
15:52 karolherbst: and those need to be parsed
15:52 karolherbst: so the x86 code in there has code to parse those tables
15:52 hell__: maybe the tables are in a data section, and the VBIOS code treats them like a table
15:52 karolherbst: and init the GPU
15:52 karolherbst: yep
15:52 hell__: like an array
15:52 karolherbst: nvbios parses the vbios
15:53 hell__: where does linux do this?
15:53 karolherbst: in nouveau
15:53 hell__: yes, where in nouveau?
15:53 karolherbst: drivers/gpu/drm/nouveau/nvkm/subdev/bios/
15:53 hell__: thanks
15:54 karolherbst: with nvbios it looks like this (cut off at random): https://gist.github.com/karolherbst/a100fa6c15f87746703566ca368214ee
15:55 karolherbst: it can't really handle that multipart thing well
15:55 karolherbst: so here a kepler one: https://gist.github.com/karolherbst/e7241f3113032c449008bd7770cab710
15:56 hell__: how do I tell nvbios about the strap?
15:56 karolherbst: you have a file called strap_peek alongside the vbios
15:56 hell__:creates a folder
15:57 karolherbst: but anyway.. vbios can't deal with the vbios you have
15:57 karolherbst: at least.. not for things located in the third part
15:57 karolherbst: the vbios points inside the third part ignoring the second one even exists
15:57 karolherbst: so all pointers need to be fixed up
15:57 karolherbst: it's very annoying
15:58 karolherbst: never got around to do that, because nouveau just provides fixed vbios.rom files
15:58 hell__: why would something point to the 2nd part (the GOP driver)?
15:58 karolherbst: 3rd
15:58 hell__: ah
15:58 karolherbst: anyway
15:58 hell__: maybe the 3rd part are blobs/tables and they need to be remapped
15:58 karolherbst: the GOP/BIOS code needs to parse e.g. the init scripts
15:59 karolherbst: those are ran when you connect a display and stuff
15:59 karolherbst: and they just change values of GPU registers
15:59 karolherbst: which is kind of an assembly language, which you also have to parse :)
16:00 karolherbst: hell__: well, yeah
16:00 karolherbst: but you also have some tables in the first part referencing into the third
16:00 karolherbst: you have a hierachy of tables
16:00 hell__: I guess the tables are some sort of reg script
16:00 karolherbst: no
16:00 karolherbst: only the init scripts are
16:00 karolherbst: the other things are literally tables
16:00 hell__: ah
16:01 karolherbst: they have a header and then rows of data
16:01 hell__: sorry, I meant the scripts
16:01 karolherbst: ahh yeah, that's somewhat a script
16:01 hell__: does it have conditional logic?
16:01 karolherbst: yes
16:01 karolherbst: check drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c
16:02 karolherbst: it's really like a trivial assembly language :D
16:02 hell__: any function to start reading from?
16:02 karolherbst: "* init opcode handlers" after that
16:02 hell__: oh, I see
16:03 karolherbst: it's incomplete, because on newer GPUs we rely on firmware parsing it for us
16:03 karolherbst: but it gives a rough idea on what it can do
16:04 karolherbst: on ampere and newer the GPU does all of that on its own btw
16:04 karolherbst: the GPU can literally POST itself
16:05 hell__: ampere, is that 30 series? the thing for which an "open-source" driver was released?
16:05 karolherbst: yep
16:05 hell__: ah yes
16:05 karolherbst: now that the GPU and firmware can do most of the things the driver did, the driver doesn't contain anything valuable :P
16:05 karolherbst: well.. there are a few interesting bits, but...
16:06 hell__: it's something similar to what changed with AMD Ryzen CPUs
16:08 hell__: before, the boot firmware (BIOS or something else) would need to do memory init, which is one of the most complicated initialization steps
16:08 karolherbst: yeah
16:08 karolherbst: memory reclocking code is massive in nouveau as well
16:09 karolherbst: and no fun to reverse engineer, because every memory is different
16:09 hell__: but AMD changed things with Ryzen: memory init is now done by the PSP (Platform Secure Processor, a coprocessor) before the x86 cores are released from reset
16:10 karolherbst: I am convinced that this is the only thing you can do it properly anyway
16:11 karolherbst: at least if you care one bit about power consumption as well
16:12 hell__: ah, power consumption is one of the reasons why memory training (at least on Intel systems) suddenly became a lot more complex
16:12 karolherbst: well.. the problem is training on its own, the problem is, if you want to change power modes to go into power savings/low perf modes, you need to cut access to RAM
16:13 karolherbst: which opens up quite a lot of issues
16:13 hell__: yup
16:14 hell__: and you don't really need to train much if the RAM isn't replaceable, and the board routing is already accounted for
16:14 karolherbst: yeah... could be
16:15 karolherbst: we still have to do link training on GPUs, but that's more because you can freely change clocks
16:15 karolherbst: and depending on the clocks, the training changes
16:16 hell__: compare this with code that needs to make RAM work without prior knowledge of the RAM or board it runs on
16:17 karolherbst: well, that's the situation you have on GPUs as well
16:17 karolherbst: vendors can stick whatever RAM modules
16:17 hell__: doesn't the GPU VBIOS have tables with RAM info?
16:17 karolherbst: sure, but there is some indirection going on there
16:18 karolherbst: might be not as bad as with desktops though
16:18 karolherbst: but I do know that the RAM chips used actually matters for the code
16:18 hell__: yup
16:20 hell__: where does nouveau configure memory stuff?
16:20 karolherbst: inside drivers/gpu/drm/nouveau/nvkm/subdev/fb/
16:20 karolherbst: we actually write a script
16:20 karolherbst: which we parse to the PMU to execute
16:21 karolherbst: drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c should contain it for kepler
16:22 hell__: > gk104_ram_nuts
16:22 karolherbst: the entire code is insane, I don't understand any of it :P
16:23 karolherbst: afaik it's not even complete and doesn't work with some GPUs
16:23 hell__: and I presume none of these things are documented anywhere
16:24 karolherbst: of course not
16:26 hell__: hmmm, maybe I should try to find a random GPU and toy with it
16:26 karolherbst: would be fun I bet
16:27 hell__: how does one figure out such low-level details? looking at what the proprietary driver does?
16:27 karolherbst: soooo
16:28 karolherbst: the nvidia driver has a nice dev feature.. you can upload your own vbios and it uses that instead of the GPUs one
16:28 karolherbst: so you flip some bits, mmiotrace nvidia and check what changes
16:28 hell__: ah, so tracing
16:28 karolherbst: yeah, mostly
16:29 hell__: hmmm, I get a 500 error code: https://nouveau.freedesktop.org/MmioTrace.html
16:29 karolherbst: "fun"
16:30 karolherbst: ahh 500.. seems like gitlab is at it again
16:32 karolherbst: yeah.. looks like it
16:32 karolherbst: hell__: wiki sources are here: https://gitlab.freedesktop.org/nouveau/wiki/-/tree/master/sources
16:32 hell__: thanks
16:33 karolherbst: mmiotrace is actually quite the cool tool, but I suspect you find better docs somewhere else
16:33 karolherbst: e.g. https://wiki.ubuntu.com/X/MMIOTracing#:~:text=Check%20the%20trace-,What%20is%20an%20MMIO%20Trace,what%20hardware%20state%20it%20reads.
16:33 hell__: I'm familiar with the concept of tracing register accesses
16:35 hell__: I haven't done a lot of it myself, I generally decompile stuff and try to figure out what it's doing
16:36 karolherbst: yeah... well.. nouveau is all blackbox re for random reasons
16:36 karolherbst: uhm
16:36 karolherbst: clean room I think is the more correct term
16:36 hell__: i.e. not disassembling/decompiling anything, I presume
16:36 karolherbst: except shaders
16:37 hell__: that's too high level for my taste :D
16:37 karolherbst: :D
16:37 karolherbst: but honestly, I am sure no one would be able to keep up with nvidia decompiling their driver
16:39 hell__: wdym?
16:39 karolherbst: well, the driver is huge and they do update it quite often
16:40 hell__: ah
16:40 karolherbst: also I think all left over function names are all randomized
16:40 hell__: when I decompile stuff I only look for specific things
16:41 karolherbst: sure, but where to start.. but yeah.. you might find mmio register addresses and can go from there
16:42 hell__: and updates could be ignored unless they affect the areas being investigated
16:42 hell__: an interesting idea is to correlate traces with code
16:46 karolherbst: those traces are huge
16:47 karolherbst: we have several ones and they are usually in the single to two digit MB xz compressed data area
16:48 hell__: hrm, no way to split them into smaller chunks? i.e. this is memory stuff, this is something else, etc
16:49 karolherbst: yeah well.. it's all very random, but what you usually do is to simply insert markers so you know when you start doing something
16:54 hell__:is reading https://nvidia.github.io/open-gpu-doc/BIOS-Information-Table/BIOS-Information-Table.html
16:55 hell__: VBIOS pointers may point to data beyond the end of the PC-compatible (legacy BIOS, Code Type 00h) image. If a UEFI (Code Type 03h) image follows the PC-compatible image, then the pointer must be adjusted to be an offset into the data following the UEFi Image.
16:55 hell__: If (pointer > PC-compatible image length) { adjusted_pointer = pointer + UEFI image length }
16:55 karolherbst: yep