04:46 Lightsword: has anyone done any reverse engineering of the nvidia signed firmware signature format?
04:47 imirkin: what do you mean?
04:47 imirkin: there's code in the kernel to load it
04:47 Lightsword: imirkin, I mean in regards to how the GPU actually validates it
04:47 imirkin: ah ok
04:47 imirkin: not sure. not me though.
04:48 Lightsword: imirkin, was curious because I took a look at a signature file(gpccs_sig) and it looked like it may be possible to forge signatures
04:49 imirkin: that'd be nice.
04:50 Lightsword: imirkin, so I see what looks like a 512 bit signature along with a 128 bit hash
04:50 Lightsword: imirkin, which implies it’s potentially 512 bit RSA with a 128 bit hash like MD5
04:51 Lightsword: 512 bit RSA of course was broken back in 1999
04:51 imirkin: iirc AES-128 was considered more likely
04:52 imirkin: like an HMAC or something. dunno.
04:52 Lightsword: now I could be wrong and the 512 bit signature could be ECDSA and in that case it wouldn’t be breakable, but 512 bit is unusual for ECDSA
04:52 Lightsword: imirkin, AES-128 is a symmetric algo
04:53 imirkin: but you can use it to hash things
04:53 Lightsword: AES isn’t a hash algo though
04:54 Lightsword: it wouldn’t make sense to use AES in a signature scheme, would make sense to use it in an encryption scheme, but the decryption key would have to exist in the hardware
04:55 imirkin: from what i can tell, the signature is 16 bytes. not 100% sure what the remainder of that sig file is
04:55 imirkin: the code is pretty convoluted
04:56 Lightsword: 16 bytes sounds like a hash, typically a signature scheme involves an asymmetric algo like RSA or ECDSA plus a hash algo like SHA256
04:56 imirkin: it does look like there's 64B of actual data in there
04:57 Lightsword: so what you have is likely a signature of a hash of the firmware
04:57 Lightsword: it’s 64B plus 16B, 64B for the signature+16B for the hash
04:58 imirkin: https://hastebin.com/cokohayevi.cpp
04:59 imirkin: that's the signature
05:00 imirkin: GP10x signatures are 192 bytes btw
05:00 imirkin: i think the GM20x signatures correspond to this: https://hastebin.com/sabaforafo.cpp
05:01 imirkin: maps fairly nicely.
05:01 imirkin: basically it's 4 128-bit signatures
05:02 imirkin: 2 for "prod", 2 for "debug" (which is a special gpu mode, or perhaps a special gpu chip entirely used for internal testing at nvidia)
05:02 imirkin: (or 1 256-bit value. who knows)
05:05 Lightsword: hmm, 128-bit signatures would imply a form of ECC
05:05 Lightsword: elliptic curve crypto
05:06 Lightsword: 128-bit RSA would be totally broken
05:15 Lightsword: even 128-bit ECC is sketchy, standard is 256 bit(key length for asymmetric algos are higher than symmetric)
05:16 imirkin: yeah, could be a single 256-bit value for all i know
05:16 imirkin: (why would there be two signature keys?)
05:16 Lightsword: well the firmware has multiple files right?
05:17 imirkin: otoh they did some weird shit with video decoding, whereby diff gpu's would have different keys. and they would ship the generic firmware as one blob, and then write in the "right" key at runtime
05:17 imirkin: hmmm... data could be signed too i guess
05:17 imirkin: that seems reasonable
05:17 Lightsword: well my assumption was that the data and signature is separated
05:17 imirkin: (no idea wtf the *_bl.bin files are btw)
05:18 Lightsword: ie you load data then load signature in order to get the GPU to execute
05:18 imirkin: well, the falcon engines take the program (potentially) as two separate segments -- code and data
05:18 imirkin: the signature is separate from that
05:18 Lightsword: so sounds like it could be 1 signature for code and 1 for data
05:18 imirkin: although it's also possible to mash it all into a single segment
05:18 imirkin: they don't do that for gr, but it's done for the video decoding firmware
05:19 imirkin: so yeah, my guess is 128-bit
05:19 imirkin: and there's a secret inside the gpu
05:20 Lightsword: well a proper signature scheme would never rely on the device having a secret at all
05:20 imirkin: could be HMAC_MD5 or something
05:20 Lightsword: the device would have a burned in public key that can only validate signed firmware, not create it
05:20 imirkin: wouldn't count on it...
05:22 Lightsword: well any other way would be pretty stupid :P
05:22 Lightsword: but then again…I’ve seen plenty of stupid
05:22 imirkin: stupid for maximizing the reliability of a scheme
05:23 imirkin: but silicon ain't free
05:23 Lightsword: I mean, anything involving symmetric keys is vulnerable to forged signatures if anyone manages to extract a key
05:23 imirkin: not too easy to extract the key from the silicon though
05:24 imirkin: and the computational resources required to break even hmac-md5 are fairly substantial
05:24 imirkin: (if that's really what it is)
05:26 Lightsword: HMAC_MD5 is symmetric though right?
05:28 imirkin: sure
05:29 Lightsword: so that wouldn’t even need to be brute forced at all, you would just need to find a way to dump the key
05:30 imirkin: yeah, it's pretty straightforward
05:30 imirkin: just use acid to etch off a layer of the silicon
05:30 imirkin: take an x-ray image
05:30 imirkin: repeat
05:30 imirkin: and then analyze
05:32 Lightsword: for example for what a secure signature scheme looks like, I do embedded linux firmware development and I create firmware updates that have a manifest file which contains sha256 hashes of each image that makes up the firmware, that manifest is then RSA 2048 signed, in order to validate it the device first checks that the manifest has a signature that matches a key embedded in the device, then each image is checked to ensure that it has a sha256 hash
05:32 Lightsword: matching that in the manifest
05:35 Lightsword: imirkin, is there any code that can be dumped from the GPU before firmware is loaded into it?
05:36 imirkin: hmmmmm interesting question
05:36 imirkin: so the GPU silicon actually has about a half-dozen little CPUs running around
05:36 imirkin: ("falcon" is the name of the architecture)
05:36 Lightsword: basically if you can get a dump of the code with the signature validation scheme you can then see if it’s breakable
05:37 imirkin: i don't think that's running on a user-visible cpu though
05:37 imirkin: i think that's "in the silicon"
05:37 imirkin: although, how deep, who knows
05:37 imirkin: i'm not the best person to talk about these things... probably mwk would know more
05:37 Lightsword: well the GPU still has limited functionality when running unsigned firmware right?
05:37 imirkin: not quite
05:38 imirkin: when you load firmware onto the CPU, it can run in one of three modes
05:38 imirkin: unsigned, low-secure, and high-secure
05:38 imirkin: in the non-high-secure mode, certain register accesses just fail
05:39 Lightsword: so it sounds to me like the signature checking is done fairly late in the initialization sequence
05:39 imirkin: i think it's just a blacklist, since a whitelist would be prohibitive
05:39 imirkin: the signature checking is done whenever you upload code
05:39 imirkin: which is also done via register writes
05:40 imirkin: the logic that handles register writes (and reads) is entirely opaque to us
05:40 Lightsword: yeah, so the routine may be in some memory somewhere
05:40 imirkin: the "firmware" in question runs at a higher level than the logic dealing with registers
05:40 imirkin: since the firmware runs on actual CPUs, whereas the reigster read/write logic is triggered by accessing MMIO regions
05:42 imirkin: while it seems likely that this hidden logic is not all fixed-function, i'm not aware of any of its details in any way. i don't think it's user-modifiable in any way.
05:43 Lightsword: is it roughly equivalent to say intel CPU microcode?
05:43 imirkin: not sure
05:43 imirkin: so like ... take a simpler device
05:43 imirkin: say an ethernet card
05:44 imirkin: if it's in any way modern, there will be a BAR pointing at a register region
05:44 imirkin: which you can read/write in order to make the packets go
05:44 imirkin: configure dma and whatnot
05:44 imirkin: internally in that ethernet card, there may be some firmware that makes that logic work
05:44 imirkin: but it's not accessible
05:45 imirkin: this is the kind of thing i'm talking about with nvidia. the gpccs/fecs/etc firmware -- that's all at a much higher level. it's not to drive basic card functions. it's to coordinate different parts of the gpu together.
05:46 imirkin: i'm not aware of any information regarding that lower-level thing
05:47 Lightsword: gpccs/fecs/etc is that what’s here https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia ?
05:48 imirkin: yes
05:48 imirkin: so like gpccs = gpc context switch
05:48 imirkin: when the gpu wants to switch contexts, it invokes this bit of code, which will read out a bunch of gpu state
05:48 imirkin: and save it off somewhere
05:48 imirkin: and load another context's worth of state in
05:48 imirkin: fecs = frontend context switch
05:48 imirkin: same idea, but slightly different data
05:49 imirkin: acr is ... either a separate engine, or part of the PMU engine, which is the thing that has to actually load things in
05:49 Lightsword: based on the file sizes it looks roughly eqivalent to intel microcode sizes
05:49 imirkin: ok, but that microcode does nothing more than what it says on the tin -- just reading/writing a bunch of data
05:50 imirkin: [and also responding to certain firmware calls whcih can be made in the 3d command stream]
05:50 Lightsword: so the custom ucode that nouveau was making before firmware signing was the equivalent to what’s in linux-firmware right?
05:50 imirkin: yes
05:50 imirkin: https://github.com/skeggsb/nouveau/tree/master/drm/nouveau/nvkm/engine/gr/fuc
05:51 imirkin: look at the *.fuc files
05:51 Lightsword: well intel microcode from my understanding can modify CPU instructions to some degree
05:51 imirkin: this code just sets up a little RTOS on a CPU inside the GPU which services various external requests
05:52 imirkin: there's about a half dozen such CPUs in there
05:52 imirkin: which all have slightly different access to things
05:52 Lightsword: is it itself an RTOS or is it just doing some initialization?
05:52 imirkin: each one of those *.fuc files is a RTOS
05:52 imirkin: [except com.fuc, which is a helper for both of them]
05:53 imirkin: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/gr/fuc/gpc.fuc#L291
05:53 imirkin: (for example)
05:53 imirkin: and there is a bit of init in there too, i guess
05:54 Lightsword: so it’s essentially the GPU instruction interpreter I guess?
05:56 imirkin: no, that's all silicon
05:56 imirkin: (or, at least, opaque to us)
05:57 Lightsword: so the commands referenced here “pulls a command from the queue and executes its handler” are what exactly?
05:57 imirkin: those are the commands sent to the RTOS
05:57 imirkin: like "perform context switch now!"
05:58 imirkin: that might be the only command for the gpc. i think the other thing can get more commands.
05:58 imirkin: the commands come in over a fifo-type queue + interrupt iirc
05:59 Lightsword: so commands telling the RTOS to perform various tasks I guess?
06:00 imirkin: yep
06:00 imirkin: very few tasks
06:00 imirkin: calling it an RTOS is being charitable
06:00 imirkin: but i think it technically qualifies
06:00 Lightsword: yeah, that’s what was confusing me, it seemed too simple to even be an RTOS
06:01 imirkin: in that it will sit there and loop, and respond to inputs :)
06:01 Lightsword: sounds more like a bare metal type thing than anything
06:02 imirkin: well, it's a real CPU inside of that giant piece of silicon (several of them, in fact)
06:02 imirkin: and it runs code that we upload into it.
06:02 Lightsword: do applications run on top of it?
06:02 imirkin: no
06:02 imirkin: it's only used for helping things along
06:02 imirkin: the PMU is used to help with reclocking
06:02 Lightsword: wouldn’t the term RTOS imply that it’s running other applications?
06:02 imirkin: (it might e.g. report to the OS updated loads/power consumption/etc)
06:02 imirkin: not in my mind
06:03 imirkin: like in a plane, i expect that the flight controls are on a RTOS
06:03 imirkin: not exactly what you want to load some random piece of software onto :)
06:03 imirkin: i guess you might still have user/kernel code, this is all, effectively, kernel
06:04 imirkin: [i think before falcon, back when they were xtensa chips, there might have been a possibility to do that, but i don't think that's the case anymore]
06:04 Lightsword: spacecraft often run off of a RTOS like vxworks
06:04 Lightsword: which allows for multi-tasking
06:05 imirkin: yeah, that's a real RTOS :)
06:05 Lightsword: if it’s just an application running by itself on bare metal seems a bit odd to call it an operating system
06:05 imirkin: fair enough.
06:08 Lightsword: imirkin, you can even run linux side by side with a bare metal application on some xilinx boards https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841668/Multi-OS+Support+AMP+Hypervisor#Multi-OSSupport(AMP&Hypervisor)-Linux/Bare-metalAMP
06:09 imirkin: heh
06:09 imirkin: i remember getting very confused by how to operate a virtex2pro with the ppc cores inside it
06:09 Lightsword: there’s some weird hacks like that some people use to get around linux not having accurate timing
06:10 Lightsword: usually you would use the bare metal app for IO
10:03 RSpliet: Lightsword: I don't think there's a very strict definition of what constitutes an RTOS. It's all about predictable scheduling of the tasks at hand, being able to provide latency guarantees. There's no requirement of kernel/userspace separation or task isolation, so it could technically just be a little library that allows you to define tasks and schedules them round-robin.
16:02 karolherbst: imirkin: mhh, I see some fails with the webgl test suite, all/conformance/context stuff
16:11 karolherbst: ohh uhm.. could have been local changes
17:19 imirkin: karolherbst: dunno about context stuff, but i've already debugged many of the failures
17:19 imirkin: e.g. the glBlitFramebuffer stuff is due to pipe_blit_info cutting stuff off
17:22 HdkR: Was that in Nouveau specific code or common code?
17:22 karolherbst: I think the context stuff was caused by my mt fixes. Doesn't happen on master
17:23 karolherbst: imirkin: did you look into the deqp/functional/gles3/fborenderer fails?
17:23 imirkin: probably. but i don't remember off-hand
17:23 imirkin: HdkR: st/mesa
17:23 imirkin: the interface doesn't support passing the proper values
17:24 imirkin: since pipe_box's width/height were reduced to 16-bit
17:24 karolherbst: imirkin: sounded like somebody thought we will never get values bigger than 16 bit...
17:24 imirkin: i talked briefly with mareko about it, he suggested doing something else with pipe_blit_info and leaving pipe_box alone
17:24 karolherbst: but we don't realy have to adjust the interface for that, do we?
17:25 karolherbst: imirkin: couldn't we just fail the glBlitFramebuffer call if the box is out of bounds? or something?
17:25 imirkin: that's not how it's specified.
17:26 imirkin: basically it says to blit x=0,y=0,width=0x7fffffff,height=0x7fffffff
17:26 imirkin: which is meant to work
17:26 imirkin: based on how it's specified
17:26 karolherbst: mhh, so we clamp the values to 16 bit, no?
17:27 karolherbst: from my understanding we shouldn't be able to get to a situation where a bigger value makes any sense
17:27 imirkin: no, we just take the low 16-bit
17:27 imirkin: welll ...
17:27 karolherbst: how would that be legal?
17:28 imirkin: i'd encourage you to re-read how glBlitFramebuffer works then
17:28 imirkin: (it's not legal)
17:28 imirkin: (but neither would be clamping)
17:29 karolherbst: this all depends on wheather we have framebuffer being able to have bounds outside the 16 bit value range, or not? If we can have that, then I don't see why the box shouldn't be changed to hold 32 bit values
17:31 karolherbst: but uhm.. isn't blitting "x=0,y=0,width=0x7fffffff,height=0x7fffffff" undefined to begin with as you always have overlapping src/dests?
17:32 karolherbst: ohh uhm... that only counts if read/draw buffers are the same
17:32 imirkin: source and dest are different
17:32 imirkin: pipe_box is used in other places
17:32 imirkin: this is why marek was suggesting doing something special for pipe_draw
17:32 imirkin: er, pipe_blit_info
17:32 imirkin: [that was an odd typo...]
17:33 karolherbst: ohhh, I see. so we could rather add a pipe_box_32 to pipe_blit_info essentiallyy
17:33 karolherbst: which is rather annoying though
17:33 karolherbst: :/
17:34 imirkin: sure
17:35 imirkin: i just fixed the div-by-zero that resulted from all this and moved on to the next thing :)
17:35 karolherbst: :D
17:35 karolherbst: I am still wondering if we can get framebuffers being bigger than 0xffff x 0xffff
17:36 karolherbst: (but I guess silly things lead to special situations where this assumption doesn't hold up anymore)
19:33 karolherbst: imirkin: so I got around 0.6% subtest fails
20:04 imirkin: karolherbst: the fb can't be. but the blit parameters can. there's weird scaling implications, so you can't just clamp it.
20:04 imirkin: karolherbst: there are fairly few failures. many of them are about the stupid goddamn rgba4 situation
20:06 karolherbst: k
20:06 imirkin: a few i've fixed already, not sure if i cc'd to stable or not
20:07 imirkin: [i'm pretty disillusioned with the stable process, so i've not been worrying too much about it]
20:16 karolherbst: I usually add fixes tags and those commits are picked up automatically
20:17 karolherbst: but yeah, doesn't work for things which aren't regressions
20:18 karolherbst: imirkin: are there any fixes not applied to master?
20:20 imirkin: let me check
20:20 imirkin: no
20:20 imirkin: at least nothing obvious
20:21 karolherbst: well I will rerun the test with bgra4 disabled and see how that goes
20:23 karolherbst: imirkin: I got a timeout for the bptc compression test
20:24 karolherbst: those timeouts are a bit weirdo anyway
20:24 imirkin: some tests are slow. dunno what the deal is with those
20:24 imirkin: the test suite is pretty flakey
20:33 karolherbst: imirkin: it seems like the page fails to load and the test suite gets no response then :/
20:34 karolherbst: maybe running htat stuff locally works better